FAQ | Questions about Data Analyses

Are the variables I am looking for included in SOEP?

You can search for topics and an overview of all variables (including frequencies) and questionnaires in the interactive program SOEPinfo.

nach oben

Are more detailed regional data offered?

Extensive information you will find on our page on Regional Data.

nach oben

Can data from Federal States be evaluated as representative?

Extensive information you will find on our page on Regional Data.

nach oben

What are generated variables and when should I make use of them?

Generated variables serve the purpose of simplifying work with SOEP data, just as is the case with status variables. Specific assumptions are included in the generation of such variables and can be obtained in the documentation. Please look at the files $PGEN (PDF, 0.66 MB) and $HGEN (PDF, 0.64 MB) in the documentation (Contact: Joachim Frick).

nach oben

How can I switch Stata labels from German to English?

Use the Stata command label language EN to switch to the English labels.

nach oben

How can I identify retired persons in the SOEP data set?

Depending on your research focus you have different possibilities:

  • Self-reported employment status of the interviewee in the previous year. This information is stored in the calendar file $PKAL and includes employment status as retiree [variables $P1E01 and $P1E02], or the receipt of pension / retirement income in the previous year [variables $P2D01 to $P2D03]
  • Age of the interviewee (e.g., derivable from the variables GEBJAHR in PPFAD)
  • Consideration of current employment status (not employed), also in connection with age specification. Please be aware that retired persons currently in paid employment rank among 'employed' in the central employment filter (e.g., UP09)
  • Persons who have been in retirement for only a short time can be identified via the reason given for termination of last job (e.g., in 2004: variable UP75, code 6: "in retirement")
  • In addition to the above-mentioned combinations, you may also choose others such as "age > 65" together with "Receiving Retirement Pension"

nach oben

Can the sample classification of persons within the survey change over time? Does an interviewed person in the foreigners sample remain there if he/she changes citizenship?

As a matter of principle the sample classification does not change either by a change in citizenship or by means of moving to another sample region (from west to east Germany and vice versa). The person remains in the foreigner, west or east sample. Current citizenship (NATION$$) or residential region ($SAMPREG) can be easily recognized.

nach oben

Which variable contains a correct regional classification of those surveyed ($SAMPREG vs. PSAMPLE or HSAMPLE)?

Since the beginning of SOEP, numerous persons within the survey have moved from east Germany to west Germany and on a smaller scale vice versa. Analyses which are geared to regional references, are partly severely distorted if the variable PSAMPLE (which indicates sample classification) is used.

(PSAMPLE can be found in PPFAD: 1 = subsample A, 2 = subsample B, 3 = subsample C, 4 = subsample D (Immigrants), 5 = subsample E (supplementary sample from 1998 onwards), 6 = subsample F (innovation sample from 2000 onwards)).

A correct regional classification of persons within the survey can only be achieved with the use of the time-dependent variables $SAMPREG in PPFAD and HPFAD (1 = West Germany, 2 = East Germany).

Since 1990 the west German and east German populations have been determined in $SAMPREG irrespective of the sample classification. We therefore recommend always using this value for regional analysis!

The following table (PDF, 5.69 KB), made while cross-tabulating $SAMPREG and PSAMPLE, gives an insight into the extent of regional mobility since 1990 (basis: all persons with $NETTO=1 (person interviews) or $NETTO=2 (children up to 16 years) in surveyed households).

nach oben

Which variables contain the correction classification of German and non-German persons surveyed in individual samples A-F (NATION$$ vs. PSAMPLE)?

Analogous to the phenomenon above ($SAMPREG vs. PSAMPLE) , the identity of sample B is often assumed to be that of the population of the group of "foreigners" surveyed by the SOEP, while sample A contains "Germans" . For the most part this is correct, although it is not precise and over time becomes less accurate.

At the beginning of the SOEP in 1984, it was the head of the household's nationality which defined the classification in both samples A and B. Nevertheless, it is possible that there are other household members present with a different nationality to that of the head of the household. In addition, sample A contains foreigners whose nationality was not represented in sample B. The difference between differ enormously. While up to the year 2000, sample C contains almost without any exception persons with German nationality, due to the high share of emigrants, sample D relatively contains a lot of Germans.

An ex-ante classification of the respective persons in "German" and "non-German" is impossible in the latest samples E and F due to the sample designs.

The following table (PDF, 6.3 KB), which was created through a cross-tabulation of the re-coded information contained in NATION$$ (1=German, 2=non-German including Item-Non-Response) and PSAMPLE, gives an insight into the heterogeneity of the SOEP samples regarding the nationality composition since 1984 (basis: all persons with $NETTO=1 (person interview).

nach oben

What should be heeded with regard to the new survey method CAPI?

Within the framework of a coincidental splitting of the Sample, the new survey method CAPI (Computer Assisted Personal Interview) was used in about half of the cases in Sample E. These interviews can be identified in the variables $PFORM* in $PBRUTTO or $HFORM* in $HBRUTTO.

Early analysis shows no signs of any significant method effects, i.e. the content of the results appears not to have been influenced by the method of data retrieval. Further analysis by users regarding retrieval methods would naturally be advisable.

Since 2001, this survey method has been increasingly adopted for the old subsamples A to D, as well as F.

nach oben