Changes in the Dataset

Data Updates

SOEP Quicklinks:    

SOEPinfo

SOEPlit

SOEPnewsletter

SOEPmonitor

SOEPdata Documents

SOEPdata FAQ

Data distribution 2008 (Wave Y)

The new dataset (Waves 1-25, 1984-2008) contains extensive improvements, additions, and modifications. Besides the usual wave-specific data YPRUTTO, YP, YPKAL, YPGEN, YHBRUTTO, YH, YHGEN, YKIND, and XPLUECKE as well as the updated files with a longitudinal component (PFAD files, biography files, spell data, and weighting factors).
We now also provide-in a beta-release-the data in a more user-friendly format called "SOEPlong". We announced this in SOEPnewsletter 80/2008 and thank all those who provided input on this issue. The new and preliminary version of the SOEP data in long format can be obtained upon request. We suggest that only "power users" should order this version of the data who would like to work with us to improve data management. This version contains all data and thus can essentially already be used for final analyses. This is a preliminary version. We do not recommend the new format for inexperienced users. If new SOEP users want to work with the new format, they should at least be familiar with other panel datasets.

The most important improvements in the new data istribution are listed in the following:

1. New Datasets

1.1 Dataset BIOAGE06
The new data distribution contains the new file BIOAGE06. For the first time in 2008, it includes the information collected using the special motherchild questionnaire, usually from mothers of fiveto- six-year-old pre-schoolers. The data are thus on a birth cohort that was first “surveyed” in the year 2002/2003 with a special Newborn Questionnaire. The new data on pre-school-age children contains children’s height and weight, health, care situation, activities with and without the mother, and media usage. Detailed questions address the care situation. Furthermore, valid information on the child’s personality is collected for the first time (based on the “Big Five” personal traits indicator in the main questionnaire for adults) and their socio-emotional behavior (surveyed with a modified version of the Strength and Difficulties Questionnaire).

1.2 Dataset MOVEDIST
We provide a new dataset on the change of residence. Based on the geo coordinates at block level we will provide the information about the distance (in meter) between the former and the present residence. However the information will only be available for moves since 2000 and is NOT available on this DVD! We distribute this data together with data on the spatial planning regions (ROR) on an extra CD-ROM. You need an extended data distribution contract including a data protection concept if you want to use this kind of data. After signing your contract extension, you will receive this data on CD-ROM (at no additional cost).

2 New Variables 

2.1 Dataset PPFAD

  • MIGBACK / MIGINFO: MIGBACK provides time-invariant information on an individual’s migration background resulting from own and parental data. MIGINFO indicates the sources of the information used in order to provide users with highest possible transparency. A detailed description is available in the extensive biography documentation (see chapter on PPFAD).

2.2 Dataset PFLEGE

  • Pay / Stufe: two new variables about paid care (PAY) and the care level (STUFE) accordingly to the German compulsory long term care insurance.

2.3 Dataset PBIOSPE

The data generation process has been updated completely but without changing the basic principles. Therefore, there are only a few barely discernible deviations in the main variables (due to slight changes in the consistency checks of the data). But there are a number of visible changes in the form of additional variables or additional values in already existing variables. A detailed description is available in our documentation on biography and life history data.

2.4 Dataset BIOPAREN

  • BIO: origin of information is $LELA or $JUGEND
  • ALTER / VALTER/ MALTER: age of respondent/ father / mother -all at the time of biography interview.
  • Attention: A bug had been discovered in the dataset shortly after completing the DVD. For updating the information about the parental religious affiliation, please see our site Known Bugs/Fixes.

3 Revised Variables

3.1 Dataset PWEALTH and HWEALTH
In the year 2007, all individuals aged 17 and up were again surveyed on wealth, just as they were for the first time in 2002. These “raw” data were already part of the standard data distribution for Wave 24 and will be distributed with the upcoming data distribution in a file containing the data for 2002 and 2007 in “long format” – the file PWEALTH for individual data, HWEALTH with data aggregated according to household context. Missing values due to item or partial unit non-response (e.g., missing interviews with individual household members in interviewed households) will be subjected to multiple imputations in complex procedures taking longitudinal information into account. Documentation on this is under preparation. An initial analysis of the new wealth data for 2002 and 2007 is provided in: Joachim R. Frick und Markus M. Grabka. 2009. Wealth Inequality on the Rise in Germany. Weekly Report 5 (10), 62-73 | PDF, 383.22 KB .

3.2 Dataset $PEQUIV

3.3 Dataset HHRF/PHRF

3.4 Dataset $PGEN
  • EMPLST$$: A new category has been added to this variable ("Employment status") From 1998 on, the SOEP data contains information on working in a sheltered workshop for the disabled. Since these persons do not provide information on whether they work full-time, parttime, or on an irregular basis, the new category "sheltered workshop" has been included.

3.5 Dataset $HGEN
The domicile-related variables in the wave-specific $HGEN files have been completely revised. New additions include the full imputation of missing values (due to item-non-response) for the housingrelated variables number of rooms, heating costs, gross rent excluding heating, as well as the newly generated variable on utility costs in addition to rent. Finally, “flag variables” show the imputation status, if relevant. Experienced SOEP users may also note the change of the various variable names in the file $HGEN.


3.6 Dataset PPFAD
  • TODJAHR / TODINFO: To separate panel mortality from demographic reasons for dropping out from the SOEP sample, TNS Infratest carried out several studies to determine the current residence of panel dropouts, i.e. earlier respondents who no longer take part in the SOEP. This entailed locating 17,195 persons. These investigations allowed 981 cases to be identified in which the dropout had died. However, until 2008 all in all 3791 deaths have been identified in the SOEP (see also the documentation on the variables TODJAHR and TODINFO in the file PPFAD). Additionally, there exists a documentation in German language from our fieldwork organization TNS Infratest (“Wiederbefragung von Panelausfällen | PDF, 368.88 KB ” and an English language summary | PDF, 36.18 KB ).

Data distribution 2007 (Wave X)

The 2008 data distribution (1984-2007) provides, for the year 2007, the usual wave-specific data XPBRUTTO, XP, XPKAL, XPGEN, XHBRUTTO, XH, XHGEN, XKIND and WPLUECKE as well as the updated files with a longitudinal component (PFAD files, biography files, spell data and weighting factors).

In the survey year 2006, a representative supplementary sample for all of Germany was added: refreshment sample H. Biographical background information has been collected from respondents in sample H for the first time in 2007. This data has been fully integrated into alle relevant biography files (BIOxxxx).

As part of the SOEP innovations projects TNS Infratest Sozialforschung conducted in December 2006 a postal survey among former SOEP panel members from households which had been classified as final refusals in 2001-2004. As a byproduct we could change the information on year of birth from missing to a valid value for 21 of these persons (more information can be found in the executive summary | PDF, 36.18 KB executive summary of the TNS Infratest Methodenbericht).

Furthermore the following additions and modifications have been made:

A. New and Renamed Datasets

COGNIT06:
In the 2006 survey year, for the first time, short cognitive tests were carried out with a subsample of the SOEP. The goal was to employ a robust set of instruments that could be administered easily by trained interviewers in just a few minutes. Close to 80% of all persons chosen for participation in the cognitive test provided valid answers. Thus, for the first time, the SOEP now contains indicators of cognitive potentials for more than 5,500 persons, along with diverse educational information based on degrees and certifications. It is planned that the first repeat of the test will take place in the 2010 survey year. A detailed documentation and selection analyses can be found in Schupp et al. (2008) Erfassung kognitiver Leistungspotentiale Erwachsener im Sozio-oekonomischen Panel (SOEP), DIW Berlin, Data Documentation 32 | PDF, 447.63 KB .


PBR_EXIT and PBR_HHCH:
These two datasets replace the former dataset YPBRUTTO, however this year both variants are available 

MIHINC:
Multiple imputed dataset on monthly net household income for the years 1996 to 2007. The dataset is stored in long format (long format: hhnrakt, svyyear, mj, also called mim format within stata). Each item non-response on net household income was imputed 10 times. More information can be found in HGEN.pdf | PDF, 238.54 KB

B. New Variables 

B.1 Dataset XPBRUTTO

  • XEWSTATU: Proxy information on non-responding persons regarding their labor force status in households with partial unit non-response.

 

B.2 Dataset $PEQUIV

  • P11101$$: Copy of the wave specific variables on overall life satisfaction.

B.3 Dataset $HGEN

  • I_HINC$$: Multiple imputed version of HINC$$, the monthly net household income. Imputations 1-5 are available as wide format in $HGEN (only 1996-2007), all generated imputations (10) are available in an extra dataset called MIHINC in long format, additional information can be found in HGEN.pdf ( document,169 KB).
  • FHINC$$: Imputation flag for I_HINC$$, 0 means not imputed and 1 otherwise.

C. Revised Variables

C.1 In the Dataset $PKAL

  • $P2D03 + $P2E03: In the waves U-W (years 2004-2006) for some cases an incorrect "does not apply" missing (-2) was corrected to an "no answer" missing (-1).
C.2 In the Dataset HHRF/PHRF
  • WPHRF*: All weighting factor for the year 2006 are now based on microcensus benchmark data from 2006.

    However, the weighting factors for the year 2007are also based on (newest available) microcensus benchmark data from 2006; they are therefore only provisional with regard to the figures given for households and individuals in Germany.

  • VHHRF + VHHRF1: 1 Household from sample G was corrected and set to 0.

 

C.3 In the Dataset $PGEN

  • LFS$$: The variable „labor force status" has been improved over all waves with respect to the accuracy of classifying individuals as „non-working and older than 65"(category 2). Now, the information on the month of birth of a person is used in order to determine whether the person was older than 65 at the time of the interview.

 

D. Error Updates

D.1 In the Dataset VH and WH

  • We have corrected value labels for the variables indicating the owner of the dwelling (VH27 and WH27), please note the relevant corrections in the table below.

    Variable Label: Owner Of The Dwelling

                                                                                         
    ValueWrong Correct
    -2 Does not apply Does not apply
    -1 No answer No answer
    1 Self Owned Res. PropertyLocal Govt. Apt.
    2 Local Govt. Apt. Co-Operative Apt.
    3 Co-Operative Apt. Company Apt.
    4 Company Apt. Private Owner
    5 Private Owner Do Not Know
D.2 In the dataset $PGEN
  • EGP$$: The variable "Erikson and Goldthorpe Class Category" (international socio-economic index of occupational status) has been corrected with respect to the assignment of individuals to category (18) "not working - pensioner". Up to now, all pension recipients, i.e. recipients of retirement pension and recipients of widow's/orphan's pension have been erroneously classified as "not working - pensioner" if none of the other categories applied. In the corrected generation of the EGP$$ variable, which applies to all waves, non-working persons are only assigned to this category if they are recipients of a retirement pension or if they are recipients of orphan's/widow's pension AND are older than 60 years. Moreover, if there is missing information on pension receipt, additional information from ARTKALEN (retrospective information from the activity calendar for the previous year) is used in the generation process to determine if a person was in retirement or early retirement ("Vorruhestand") at the time of the interview. All other non-working persons are assigned to category (-2) "does not apply" as long as they are not registered as unemployed (category 15).
  • STIB$$: The same problem of misclassification of individuals to the category "pensioner" (13) applied to the variable for the "Occupational position", and has been corrected for all waves in the same way as for EGP$$ .
  • NACE$$: The variable for the "two-digit NACE Industry - Sector" had several inconsistencies with respect to the labeling. In particular, the labels for code (90) "Sewage And Refuse Disposal, Sanitation And Related" and code (95) "Private Households With Employed Persons" had to be swapped. Some other labels were not accurate, and have been stated more precisely for all waves.
  • IS88$$, ISEI$$, MPS$$, SIOPS$$, KLAS$$, EGP$$: The questions which refer to these variables are not asked from all employed persons annually. In the survey years 1985, 1986, 1987, 1988, 1990 (West), 1992 (West), 1994, 1996, 1999, 2001, 2003, 2005, and 2006 only those employed persons who changed jobs and first-time respondents are asked to provide up-to-date information. Hence, in years with a partial survey, these variables should contain the available previous year's information for all employed persons without a job change who did not update the information on their current occupation. However, for some individuals, the previous year's data was not used by mistake. This mistake was corrected by newly generating these variables for all the waves in an accurate and consistent way.

Data distribution 2006 (Wave W)

The 2007 data distribution (1984-2006) provides, for the year 2006, the usual wave-specific data WPBRUTTO, WP, WPKAL, WPGEN, WHBRUTTO, WH, WHGEN, WKIND and VPLUECKE as well as the updated files with a longitudinal component (PFAD files, biography files, spell data and weighting factors).

In the survey year 2006, a representative supplementary sample for all of Germany was added: refreshment sample H. Detailed information on the integration of this sample and additional changes in both files using weighting and extrapolation factors can be found below (see item 2).

A further important change is the introduction of a new survey instrument for first-time respondents at the age of 17 years. These persons now receive an expanded youth questionnaire, which provides current information as a supplement to the biographical data already collected, thus rendering the previous individual questionnaire used for this group obsolete. This also means that the survey population for the standard individual questionnaire (stored in the files $P) changed slightly, 17-year-olds are not included since survey year 2006 (first-time surveying of sample H constitutes an exception, since here the 17-year-olds have been surveyed again with the individual questionnaire since the biographical survey in new subsamples starts only with the second wave. The revised $NETTO-variables and the file $PAGE17 are of interest in this context (see below).

The educational variables in the generated datasets ($PGEN) have been revised: the integration of vocational qualifications attained abroad has been improved, and the corresponding variables have been subjected to extensive testing for consistency. These variables will be described in greater detail below.

The information on twins in SOEP was validated by a special survey of "potential" twins, and is integrated into the dataset BIOTWIN.

This year as in all previous years, the variables contained in the file WPEQUIV (wave 2006) relating to previous year's income take into account the various structural changes in the tax and transfer system, using these as part of the basic informational framework for generating and simulating annual income. Not only do the changes in the 2005 tax rate (reduction of the top tax rate, personal exemption) play an important role here but also the new guidelines contained in the Old Age Income Act (Alterseinkünftegesetz). The introduction of Unemployment Benefit II (Arbeitslosengeld II) also plays an important role, along with the extensive changes in the transfer system it entails (Social Security, Rent Subsidy, etc.). The generated information on (previous) year's income from SOEP survey year 2006 has thus been subject to thorough testing for internal and external consistency.

This year, the data is being distributed for the first time on DVD. This means that the language of variable and value labels can be chosen even more easily: right in the SOEP data installation program. If you install the data in Windows Vista using our setup program, please follow the installation instructions on DVD.

The following additions and modifications have been made:

New and Renamed Datasets 2006  

$PAGE17
From 2007 on, persons who have reached the age of their first individual SOEP interview (17 years) are not given the usual individual questionnaire but a special youth questionnaire. Wave-specific information not contained in the biographical data or other generated datasets (like $PGEN, HEALTH) are given in the dataset $PAGE17. Youth questionnaire respondents are identifiable with the help of the new $NETTO code "17" (see also the changes in the $NETTO variables in PPFAD). More information can be found in the biography documentation on our homepage and on the new DVD.

DESIGN
Starting in 2007, the information on SOEP sample design previously compiled in the dataset VARIANZ (Spiess 2001) is now being disseminated in a revised and amended dataset DESIGN. Preliminary documentation can be found in designdoku.pdf | PDF, 57.99 KB on our homepage and on the new DVD.

HEALTH
Starting with 2002, the SOEP health module in the individual questionnaire has been revised and put on a two-year replication period. In the HEALTH file, users find the generated SF-12 variables (measuring health related quality of life) as well as variables on height and weight with imputation flags and a user-friendly, longitudinally checked generated variable of the Body Mass Index (BMI). More information in the health.pdf | PDF, 172.32 KB on the SOEP homepage or on the new DVD.

PWEALTH and WEALTH
The wealth data asked in 2002 were thoroughly revised and checked for inconsistencies. The data are now provided in two (multiply) imputed datasets for the individual and the household level, with the corresponding flag variables for identification of the imputed values. The two datasets also each contain a generated variable on "net wealth" (see SOEPpapers No. 18 | PDF, 0.76 MB ).

Interviewer Survey
The interviewer dataset-available up to 2006 only as a "stand-alone" version-is now integrated into the standard data distribution under the name INTVIEW and thus provided in the different software formats (SAS, SPSS, STATA).

Cross-Sectional Weighting Scheme 2006  

With the 2006 data distribution, important changes have been made in the cross-sectional weights. They are described in detail (in German) in the DIW Data Documentation 22 | PDF, 310.7 KB .

1. Types of Weighting Factors Redefined
Each cross-sectional weight is designated $xHRFy. Here, $ represents the wave identifier, x the differentiation between households (x = H) and persons (x = P) and y an additional identifier that describes the type of weighting factor.

  • $xHRF are the weighting factors that have been used since the beginning. They contain all samples with the exception of high-income sample G.
  • $xHRF1 are the standard weighting factors, where-additionally to the exclusion of sample G-the weights of new subsamples have been set to zero. Why? Using a complex survey construct, respondents in the first waves showed "worse" answering behavior than respondents in later waves (for example regarding life satisfaction and annual income). sample C offers an exception: respondents in the former GDR in 1990 did not exhibit the typical problems of first-time respondents (that is, GxHRF and GxHRF1 are identical).
    For standard cross-sectional analyses, we recommend the use of the $xHRF1 as a standard weighting factor. In this way, the information from the first waves of the different subsamples is automatically left out.
  • $xHRFALL include all available samples.
  • $xHRFD, $xHRFF and $xHRFG designate the isolated weights for immigrant sample D, for refreshment sample F and for high-income sample G.
  • The Variable $PHRFXX in PHRF und HHRF has been deleted.

2. Modifications to the External Information Used in the Weighting Scheme
Since the year 2005, the Federal Statistical Office no longer provides data differentiating between East and West Berlin. This has led to the minor retrospective changes in the external information on number of households since survey year 2005.

3. New Refreshment Sample H
In 2006, SOEP expanded to include another sample-refreshment sample H. The new households, which are representative for Germany as a whole, were also included in the weighting scheme. The integration of sample H is currently underway. Tests are still being conducted to determine if and how sample H can be further adapted to the external information. This is not a serious problem since in any case, for descriptive analyses we recommend the use of the weighting factors WxHRF1, which exclude sample H .

4. Weighting Factors are Based on Benchmark Data from the 2005 Microcensus
The weighting factors for the year 2006 are based on microcensus benchmark data from 2005; they are therefore only provisional with regard to the figures given for households and individuals in Germany. Please address any questions to .

BIOAGE01 and BIOAGE17 2006  

1. BIOAGE01
Four new variables on pregnancy status have been generated, based essentially on the month of the interview from $P and the month and year of the child's birth, as well as the duration of pregnancy in weeks from BIOAGE01.

BCPREGY 'Mother: pregnant at the time of individual interview wave ($)?'
Value Labels:
2002 | Pregnant at Time of Personal Interview 2002
2003 | Pregnant at Time of Personal Interview 2003
2004 | Pregnant at Time of Personal Interview 2004
2005 | Pregnant at Time of Personal Interview 2005
2006 | Pregnant at Time of Personal Interview 2006
2007 | Pregnant at Time of Personal Interview 2007

BCPREGMO 'Mother: estimated month of pregnancy at the time of individual interview, wave($)'
Value Labels:
1 | First Month of Pregnancy
2 | Second Month of Pregnancy
3 | Third Month of Pregnancy
4 | Fourth Month of Pregnancy
5 | Fifth Month of Pregnancy
6 | Sixth Month of Pregnancy
7 | Seventh Month of Pregnancy
8 | Eighth Month of Pregnancy
9 | Ninth Month of Pregnancy
10 | Last Month of Pregnancy or after Birth

Furthermore the beginning and end of pregnancy are also available as spell data. Analogously to BIOMARSM, for example, we start counting with month 1 (January 1983), such that December 2007 is month 300. The data are generated based on month of birth and duration of pregnancy in weeks from BIOAGE01.

PREGBEGM 'Spell - Month beginning of pregnancy / conception (1 = Jan 1983)'

PREGENDM 'Spell - Month end of pregnancy / Birth (1 = Jan 1983)'

2. BIOAGE17
You will find detailled information on the structure and the content of the dataset in the documentation of the biographical data on our SOEP homepage or the DVD.

$HGEN 2006

NUTS1$$
In addition to the Bundesland (federal state) variable, starting this year, the corresponding NUTS (Nomenclature of Territorial Units for Statistics) Level 1 Variable is also provided. This variable is generally identical with $BULA in $HBRUTTO but without pooling Rheinland-Pfalz/Saarland (from 2000 on) and without differentiating between East and West Berlin.

$PGEN 2006 

1. New Variables
JOBCH$$

A variable for identification of job change was generated to supplement ERWTYP$$ (and eventually to replace it). The categories for this variable are independent of whether the information was obtained in a first-time or a subsequent interview. For respondents to a subsequent interview, JOBCH$$ refers to job changes since the last interview and for first-time respondents, it refers to job changes since the beginning of the previous year. Respondents who started their first job and respondents who made a job change are reported separately. In contrast to ERWTYP$$, JOBCH$$ has been subjected to a check for longitudinal consistency. Cases showing inconsistences-such as duplicate entries of the same job change in two subsequent interviews-have been corrected.
Value Labels:
1 | Not Employed
2 | Employed No Change
3 | Employed No Info If Change
4 | Employed With Change
5 | First- Time Employed

2. Revised Variables
GERWZEIT, HERWZEIT
For the years 1990 and 1991, values for job tenure are now provided for sample C (East) as well. Given the potentially limited comparability due to the East German transformation process, this data should be handled with particular care.

$ERWZEIT
Job tenure has been tested for longitudinal consistency due to repeated evidence of inconsistencies. Cases that proved longitudinally inconsistent were corrected using the following procedure:

  1. Start of employment at current job as stated in the respondent's first survey is generally given precedence, and is carried on in subsequent years if no change of job occurred or the respondent did not take a new job after a break in employment.
  2. In the case of a change of job (change of employer / change to self-employment) current data on the time of job change is used and carried on in subsequent years.
  3. In the case where a respondent has taken up a new job after a break in employment, we assume that he or she returned to the old employer if the current data show a start of employment prior to the last survey year. In this case, we do not use the start of employment provided in the current survey but the start of employment from the last survey. If the current data show a start of employment since the last survey year, however, we assume that the respondent changed employer since the previous survey, and update the start of employment using the data from the current survey.

From the longitudinally consistent start of employment with current employer, we determine the duration of job tenure. When a respondent who started working again after a break can be assumed to have returned to his or her former employer, the full duration of job tenure is taken. The period of the break in employment is then not subtracted, potentially resulting in an implicit overestimation of firm-specific human capital.

AUSB$$
Since 1999, the required job training variable has distinguished between studies at universities and technical colleges, and now, different categories have also been created for the years prior to and since 1999. For the years since 1999, separate categories have been introduced explicitly differentiating among these different kinds of educational qualifications. Furthermore, technical colleges and technical schools are now designated separately.
AUSB$$ 'required job training'
Value Labels:
1 | No Training
2 | Introduction to Job
3 | On-The-Job Training
4 | Courses
5 | Vocational Training
6 | Technical School, Engineering (East) 1990-96
7 | Technical College or University, up to 1998
8 | Technical College, since 1999
9 | University, since 1999

MPS$$
For waves U,V, and W, values for Wegener's Magnitude Prestige Scale have been added for respondents without a household interview ($NETTO=19).

ERWTYP$$
For the employment type variable, the old categories have been maintained but due to the common value for first-time job holders and those who have made a job change, the label for this category has been changed. Thus, the label 'employed, with change or first time employed' is now applied to the value 6.


3. Update Educational Variables
Thanks to our users, an error was identified in the generation of the educational variables in $PGEN, which had crept in some time ago in the process of retrospective generation for the years 2000 and 2001 and continued on since then. The error was in the variable $PBBIL02, and consisted in assigning foreign university degrees too high a value. The error came about through the integration of the variables $PBBILA and $PBBIL02 in these two years. All educational degrees have therefore now been generated again retrospectively for the years 2000 to 2006. The resulting variables $BILZEIT, ISCED$$ and CASMIN$$ have also been updated retrospectively from 2000 on.

PPFAD 2006 

Revision of the $NETTO Codes

$NETTO
With this year's wave W (23rd survey wave), 2006, the compilation of data on the survey population has changed fundamentally. Previously, an individual interview was carried out with all household members above the age of 16. As of 2006, the regular individual interviews based on the standard adult questionnaire are introduced one year later when household members reach the age of 18. Seventeen-year-olds instead receive an expanded youth questionnaire in their first year as SOEP respondents. (This applies to the old samples A-G; for the new sample H, distribution of this youth questionnaire will start next year, while this year's 17-year-olds have received the regular individual questionnaire, in line with the old system).
This means that we now have two instruments instead of one to obtain data on respondents: the individual and the youth questionnaire. To ensure a consistent differentiation over time, it will therefore be necessary either to include the youth population of the current year or to increase the age limit for all previous years.
The newly revised $NETTO variable assists retrospectively in both differentiations for the entire survey period. The connection between survey population and survey instrument can be retraced with the help of the variable $NETTO in PPFAD or $HNETTO in HPFAD. As a result of the change in the survey population as well as the expansion of the survey instrument to include detailed information on biographical contexts, the corresponding variable $NETTO in PPFAD has been fundamentally revised and is now provided as a two-digit variable. To ease the transition to the new variable, the old one-digit variable is still provided as well under a different name $NETOLD; the variable $HNETTO in HPFAD is unaffected by this and remains unchanged.
Value Labels:
10 | Respondent Completed Interview
11 | Individual Questionnaire
12 | Individual Questionnaire and Biography
13 | Individual and Youth Questionnaire
14 | Individual and other Questionnaires
15 | Individual Questionnaireand Experiments, Tests
16 | Individual Questionnaire, First-Time Respondent, Age 17
17 | Youth Questionnaire, First-Time Respondent, Age 17
19 | Individual Questionnaire without Household Interview

20 | Children in Household Interviewed ($KIND)
21 | Children with Mother-Child Questionnaire I, Age 0-1
22 | Children with Mother-Child Questionnaire II, Age 2-3

30 | Persons in successfully interviewed household without Individual Interview
31 | Completed Gap Interview ($LUECKE)
32 | Completed Biography Questionnaire
33 | Successful Youth Questionnaire
34 | Successful Tests and Experiments

60 | Only Questionnaire without Individual or Household Interview
61 | Gap Interview without household reference
62 | Gap Interview with drop out
70 | Only Participation in Tests, Experiments, etc.

80 | Individual did not withdraw from panel population
81 | Previous respondent lacking current information
89 | Repatriate - (was Drop Out)

90 | Individual Dropouts $YPBRUTTO
91 | Moved abroad
99 | Died

$NETOLD
In the new variable $NETOLD, the old $NETTO code can still be used. Persons at the age of 17 who either filled out a youth questionnaire (n=307) or an individual questionnaire (sample H, n=31) are each coded with the value of 1. In this way, the selection on (WNETTO == 1 | WNETTO == 5) is not identical with the population in WP.

$PEQUIV 2006 

1. New Variables
ALG2$$: Sum of all transfers from Unemployment Benefit II (Arbeitslosengeld II) received by the household. FALG2$$: Flag to identify the imputation of Unemployment Benefit II (ALG2$$).
IDEMY$$: Sum of indemnity payments received in the previous year.
FDEMY$$: Flag to identify the imputation of indemnity payments (IDEMY$$).
ITRAY$$: Sum of commuting and travel grants received in the previous year.
FTRAY$$: Flag to identify the imputation of commuting and travel grants (ITRAY$$).

2. Revised Variables
I11105$$
The variable (rental value of personally used living space = imputed rent) has been generated up to now only for persons living in owner-occupied housing. In line with recent research findings and also European Community guidelines for the generation of imputed rent in EU-SILC, this fictitious income advantage is now generated for persons in rental households as well who claim to pay below-market rental prices. These include people in rent-free housing, in socially subsidized housing, and in rental properties offered at a special rate (company dwellings, apartments provided by relatives at reduced rent, etc.).

W11101$$ and W11102$$
Due to the changes to the weighting factors in the files PHRF and HHRF, the variable W11101$$ now contains the individual weighting factor $PHRF1 (from the file PHRF) and the variable W11102$$ now contains the household weighting factor $HHRF1 (from the file HHRF).
First-time SOEP respondents show a significantly higher rate of item non-response in their first wave, which cannot be corrected adequately through imputation. For this reason, these two weights do not take into account the first wave of each new SOEP subsample. Furthermore, high-income subsample G has been excluded from the weighting scheme in order to prevent structural breaks in the analysis of income with vs. without this subsample. These two weighting variables are thus particularly well suited to a consistent time series of income inequality analysis.

W11105$$
The variable W11105$$ now contains the individual weighting factor $PHRFALL (from the file PHRF). This weighting variable takes into account all SOEP subsamples.

E11105$$
The content of the variable E11105$$ is now based on the ISCO88 International Standard Classification of Occupations.

E11106$$ and E111076$$
The variables E11106$$ and E11107$$ now provide information on sector affiliation in the form of a one or two-digit number according to the NACE scheme, the International Standard Industrial Classification of all Economic Activities.

3. Deleted Variable
W11106$$ 'HH-Weight immigrant sample'

BIOBIRTH; BIOBRTHM 

KIDMON[n]
With wave W, the birth biographies of men (BIOBRTHM)-like those of women (BIOBIRTH)-include not only the year of birth (KIDGEB[n], with n = (1...15), but also the month of birth for each child (KIDMON[n]). This birth month is identical with the child's birth month given in PPFAD.  

BIOTWIN 

In 2006, a separate survey was carried out in all households with twins. This twin survey had the goal of validating the data on all twins in SOEP and gaining new information. The following variables have been changed or added in BIOTWIN as a result:

BIOMONOZ
The variable BIOMONOZ differentiates between identical and fraternal twins based on a question asked to first-time respondents. This information used to be obtained through a question asking whether the twins were of the same or different sexes. New codes have been introduced for the variable BIOMONOZ to reflect the improved information available. The values are thus no longer compatible with those from prior to wave W contained in variable BIOMONOZ in the dataset BIOTWIN.

INFOTWIN
The variable INFOTWIN has been introduced. This variable tells whether information on twins was given in the 2006 twin survey, whether the information was derived from previously exisiting SOEP data, and whether previously existing data on the twins coincides with the results of the twin survey.

EGP$$  

The variable "Erikson and Goldthorpe Class Category" (International Socio-Economic Index of Occupational Status) has been corrected with regard to the categorization of freelance academics, who were previously grouped together with the self-employed (values of 5 or 6). The corrected generation process assigns academic freelancers to the upper service class, which corresponds to a value of 1.

nach oben

Data distribution 2005 (Wave V)

The 2006 SOEP data distribution (1984-2005, Waves A-V) includes the usual wave-specific data VPBRUTTO, VP, VPKAL, VPGEN, VHBRUTTO, VH, VHGEN, VKIND and UPLUECKE, as well as updated versions of all datasets with a longitudinal component (spell data, biographical data, and weights).

The first CD-ROM contains, as usual, all SOEP data with variable labels and value labels in German, and the second contains all SOEP data with variable labels and value labels in English.

Please also note the following improvements and changes:

New and renamed datasets 2005 

With the current data distribution, we renamed all SOEP datasets based on age-specific biographical questionnaires (e.g., "Mother and Child") in a more consistent manner. Since all these datasets are saved in long format, the names now start with "BIOAGE" and a two-digit suffix. This suffix gives the maximum age of the individuals in question during the survey year.

BIOAGE01
New name for the dataset BIOCHILD up to the present (based on the questionnaire for mothers with a newborn child below the age of 15 months).

BIOAGE03
New dataset based on mother-and-child questionnaire for mothers with a child between the ages of 2 and 3 years. For further information, please see the biographical data documentation.

BIOAGE17
New name for the dataset previously known as BIOYOUTH (based on a survey of adolescents between 16 and 17 years old).

Weighting 2005

The 2005 cross-sectional weights are provisional - an update of VPHRF and VHHRF will be released in fall 2006

The wave-specific projection and weighting variables will be adjusted annually to external official data to ensure the accuracy of marginal distributions on age, sex, household size and nationality. The source of the data is the German Federal Statistical Office's official microcensus. From 2005 on, the data on Berlin will no longer be reported separately for the areas comprising former West Berlin / East Berlin; rather, Berlin will be considered part of East Germany. As a consequence, the data required to adjust our weights to the official marginal distributions will not be available before fall 2006.

To prevent this from causing a delay in the distribution of the SOEP data up to Wave V (2005), the weights (VPHRF* and VHHRF*) have been adjusted to the data used for Wave U (2004).

From our experience, there is a very low deviation in the benchmark data over the years (the new definition for West Berlin / East Berlin being one exception). Please keep in mind the provisional nature of the weighting scheme, and indicate this explicitly in any publications using the weights for Wave V. We will inform you as soon as the final version, based on the 2005 microcensus data, becomes available via the SOEP NEWSLETTER and listserver.

$HGEN 2005 

AHINC$$
The adjusted screener (AHINC$$) is now available for all waves (Exception: Sample C in 1990/1991).  

$PGEN 2005 

ALLBET$$ (new)
Raw categories for the size of the company. A consistent variable over all waves for the size of the company ("least common denominator" of the variable BETR$$).

Categories:

  1.  "less than 20"
  2. "20 to 200"
  3. "200 to 2000"
  4. "2000 and above"
  5. "Self-employed with no other employees"

BETR$$ (revised):

The variable BETR$$ now has eleven instead of nine categories. The reason is the more detailed questions from Wave V onwards. The old category "5 to 20 employees" is now split into two categories ("5 to 10 employees" and "11 to 20 employees").

The new categories are:

  1. "less than 5"
  2. "GE 5 LE 10"
  3. "11 LT 20"
  4. "up to 1990: LT 20"
  5. "1991-2004: 5 LT 20"
  6. "GE 20 LT 100"
  7. "GE 100 LT 200"
  8. "up to 1998: GE 20 LT 200"
  9. "GE 200 LT 2000"
  10. "GE 2000"
  11. "Self-employed without employees"

TIP: The variable ALLBET$$ in the dataset $PGEN offers consistent data on company size thoughout all waves of the SOEP, although with fewer categories in a less detailed classification.

EMPLST$$ (new):
Employment Status. A consistent variable over all waves to differentiate employment status (in addition to the variable LFS$$, which differentiates non-employed persons).

Categories:

  1. "Full-time employment"
  2. "Regular part-time employment"
  3. "Vocational training"
  4. "Marginal, irregular part-time employment"
  5. "Not employed"

EXPFT$$ (new):
Working experience full-time employment. Coverage of complete working experience in full-time employment (in years, one digit after the decimal point).

EXPPT$$ (new):
Working experience part-time employment. Coverage of complete working experience in part-time employment (in years, one digit after the decimal point).

EXPUE$$ (new):
Unemployment experience. Coverage of unemployment experience throughout the entire period of working life (in years, one digit after the decimal point).

Contact:

$PEQUIV 2005  

SSOLD$$ (new):
Social assistance to the elderly ("Grundsicherung im Alter").

FSSOLD$$ (new):
Imputation flag: Social assistance to the elderly.

LOSSR$$ (new):
Losses from renting and leasing.

FLOSSR$$ (new):
Imputation flag: losses from renting and leasing.

LOSSC$$ (new):
Losses from capital investment.

FLOSSC$$ (new):
Imputation flag: losses from capital investment.

D11112LL (new):
Race of individual

D11110$$ (erased):
data already included in the variables M11124$$.

D11111$$ (erased):
data already included in the variables M11125$$ .

Contact:

Bug fixes  

Correction of [T-U]HPOP in HPFAD.
Correction of some individual and household weights for the years 2003 and 2004 (THHRF, UPHRF, and UHHRF). 

nach oben

Data Distribution 2004 (Wave U)

PPFAD 2004

LOC1989
The basic demographic information in PPFAD has been expanded to include location of residence in 1989, i.e., where an individual lived when the Berlin wall fell (variable LOC1989). This information is differentiated into the categories "East Germany", "West Germany", and "Abroad" and is available for all respondents (adults and children, see further documentation in Biography and Life History Data).
Contact:

PGEN 2004 

LABGRO$$ and LABNET$$
New variables have been generated for all waves (A-U) providing information on monthly gross and net labor income (LABGRO$$ and LABNET$$), consistently declared in euro. Missing values in case of item non-response are imputed as indicated by the corresponding imputation flag variables IMPGRO$$ and IMPNET$$ respectively (see also additional documentation in PGEN.PDF | PDF, 232.61 KB ).
Contact:

HGEN 2004  

HINC$$
$HGEN now includes the monthly net household income consistently named (HINC$$) and declared in euro over all waves (A-U).
Contact: or

AHINC$$
A new variable has been generated for waves L-U (1995-2004) providing information on monthly net household income adjusted for possible underreporting (AHINC$$), also consistently declared in euro. Possible underreporting is checked with the help of the current individual incomes of all household members (see also additional documentation in HGEN.PDF | PDF, 61.43 KB ).
Contact: or

$PEQUIV or SOEP-CNEF 2004  

M11101$$-M11127$$
The files $PEQUIV now also include a set of cross-nationally harmonized health-related variables M11101$$-M11127$$ (see also the additional documentation in the Codebook for the $PEQUIV File 1984 - 2004 | PDF, 0.55 MB ).
Contact:

nach oben

Data Distribution 2003 (Wave T)

The data of the German SOEP (100% version) are distributed on three CD-ROMs covering the years 1984-2003. New data sets for the survey year 2003 are the usual wave-specific data TPBRUTTO, TP, TPKAL, TPGEN, THBRUTTO, TH, THGEN, TKIND and SPLUECKE. There are also updates of data sets with a longitudinal component (biographical data and weights). The information collected for the first time in 2003 in the biographical questionnaire for sample G ("high-income sample") has been completely integrated into the user-friendly biographical data sets.

As of this year, the data on CD-ROM #2 also contains all SOEP data with variable labels and value labels in English (including the data from the 1988 financial statement in file EV).

In addition, we have made the following additions and changes:

Sample G "High Income Sample" (Start 2002)  

The revised sampling design, using a higher income threshold, results in a smaller number of observations in wave 2.
Contact:

HHRF and PHRF 2003 

The standard weighting variables for waves S and T (SPHRF, TPHRF or SHHRF, THHRF) are based on sub-samples A-F, that is, without considering high-income sample G. In addition, we now offer a new integrated weighting variable for all sub-samples A-G (variables $PHRFAG or $HHRFAG, see also documentation | PDF, 267.43 KB on the integrated weights for A-G vs. A-F ).
Contact:

Rectypes 2003

1. BIOCHILD: Information from the 'Mother and Child Questionnaire'
In this new file, information on newborns in the SOEP will be collected each year from now on (see further documentation in Biography Data).
Contact:

2. BIORESID: Information on second residence in the first interview
The data set BIORESID includes information on length of residency, and on second residence. The information comes from the biographical questionnaire, which has consistently contained questions on this since 1994 (see further documentation in Biography Data).
Contact: Thorsten Schneider

3. BIOBRTHM: Birth biography information for men - from 2001 on
This new data set includes information on the birth biographies of men interviewed with this modified questionnaire since 2001. BIOBRTHM is structured analogously to BIOBIRTH, based on a question fomerly only answered by women (see further documentation in Biography Data).
Contact:

4. BIOTWIN: data for identifying births of twins, triplets, etc.
BIOTWIN includes all identifiable births of twins, triplets, etc. in the SOEP. Identifiers (PERSNR) for the mother and siblings are included (see further documentation in Biography Data).
Contact: and

5. HBRUTT98:
This new file contains the complete gross population of sample E in the year 1998. It is useful in attrition analysis of the first wave of this sample.
Contact:

BIOPAREN 2003

Variables on the nationalities of parents have been corrected (see further documentation in Biography Data).
Contact:

PGEN 2003  

MODE$$und MONTH$$
Two new variables have been generated for all previous waves to describe interview method and month (MODE$$ or MONTH$$. See also additional documentation | PDF, 190.51 KB .
Contact:

$PSBIL
Update of $PSBIL: For foreigners, the category "leave without graduating" [code 6] had to be updated in 2000, which in turn made it necessary to update $BILZEIT, ISCED$$ und CASMIN$$.
Contact: Bettina Isengard and

$FAMSTD
The variable for martial status has been updated.
Contact:

HGEN 2003  

HMODE$$ and HMONTH$$
Two new variables were generated for all previous waves to describe interview method and month (HMODE$$ or HMONTH$$). See also additional documentation | PDF, 238.54 KB .
Contact:

PPFAD 2003  

GEBMONAT
The central demographic information in PPFAD has been expanded to the month of birth (variable GEBMONAT). This information is now collected for all adults and children as well (see further documentation in Biography Data).
Contact:

Update of EINTRITT, ERSTBEFR, AUSTRITT, LETZTBEF (see further documentation | PDF, 85.19 KB ).
Contact:

BIOBIRTH 2003  

The information on women's birth biographies was expanded to include information from the Youth Questionnaire, which is given to 16-17 year-olds being interviewed for the first time instead of the standard biographical questionnaire (see further documentation in Biography Data).
Contact:  

BIOIMMIG 2003  

This data was corrected to fix a case of miscoding in past years that occurred due to a reversal of the item sequence. This applies to the variables BIEXPRLV, BIEXPRAC and BIEXPRAN (see further documentation in Biography Data).
Contact:

PFLEGE 2003

The new variable PNRCARE is now available for the years since 1999, that is, for waves P - T. PNRCARE is an invariable number identifying the primary caregiver in a household. In three cases, the person identified as caregiver was identical with the person being cared for. In these cases, PNRCARE was set at -3 (implausible value). For the waves prior to 1999, PNRCARE has been assigned the value -2.
Contact: Rainer Pischner 

YPBRUTTO 2003  

Revision of HHNRAKT and HHNROLD for persons listed doubly while living in a previous household.
Contact:  

$EQUIV 2003

All income data since 1984 is coded in EURO.

As a supplement to the annual income aggregates offered thus far, we now add the individual income components (sum of all income earned by all household members, variables I111xx$$) with consistent variable names over time.

All information missing due to item-non-response was imputed and marked using flag variables.

All income variables are also included for sample G, but standard weights were used on the basis of sub-samples A-F (see also the additional documentation | PDF, 41.18 KB ).
Contact:

 

nach oben

Data Distribution 2002 (Wave S)

Rectypes 2002

1. HBRUTT02
In addition to the continuous, wave-specific brutto information regarding progress in the field (SPBRUTTO, SHBRUTTO), households which were not surveyed have been included in the new subsample G for the file HBRUTT02. HBRUTT02 therefore contains all the households selected for subsample G; while the information on households who were surveyed for subsample G is also to be founding the continuous household-brutto SHBRUTTO. This matches the approach used for samples A (HBRUTT84), E (HBRUTT98) and F (HRBRUTT00).
Contact:

2. BIOSOC
The new data set BIOSOC contains youth information on everybody who has completed the biography questionnaire since 2000. This includes information such as arguments with parents, leisure activities, school grades and the federal state where they last attended school.
Contact: Thorsten Schneider

BIOJOB 2002

The data set BIOJOB contains detailed information on first jobs. As of now this also includes ISCO88 data, occupational scales, classification schemes (ISEI, SIOPS, EGP, MPS) as well as information about the sector (BRANCHE). Information regarding last jobs is a new addition and can be found in BIOJOB.
Contact: Thorsten Schneider 

BIOPAREN 2002 

The person to contact for the update of the Prestige-Scores for parents is .

PGEN 2002 

AUTONO$$
This new variable is based on the answers to 'Occupational Status' and represents the degree of autonomy in a person's occupation.
Contact:

STIB$$
This variable unifies the answers to 'Occupational Status' over all waves.
Contact:

ISCED$$, CASMIN$$
The wave specific files $PGEN have been retroactively (from 1984 onwards) expanded to include two further education variables ($ISCED and $CASMIN), which are respectively based on the international classification schemes ISCED (International Standard Classification of Education) and CASMIN (Comparative Analysis of Social Mobility in Industrial Nations). This will help improve comparisons of education-related analyses based SOEP data.
Contact: Bettina Isengard

$EQUIV 2002 

Compared to the last data set, there have been fundamental changes to the handling of Item-Nonresponse for annually-based income information and the aggregated income information contained in $PEQUIV. The established longitudinal procedure used for the imputation of Item-Nonresponse has been expanded to include a purely cross-sectional imputation for all income variables, which, however, are only to be used in the case of individual longitudinal information being unavailable. This has resulted in a complete replacement of all the missing income data in the $PEQUIV files (for further information for the methodical procedure for additional imputation cf. Frick, J.R. and Grabka, M. (2003): Missing Income Data in the GSOEP: Incidence, Imputation and its Impact on the Income Distribution | PDF, 1.01 MB ).

Due to this, all the so-called imputation flags have been revised. This now reproduces the share of the imputated income in the respective income aggregate, i.e. if all information is present the value will be 0 and if any Item-Nonresponses are present then the value may be anything up to 100.

In addition, complete income information for the new sample F for years from 2000 to 2002 now also available.

The CNEF data is not yet available for the first wave of sample G, as the methodically demanding imputation algorithms applied by the SOEP require longitudinal data.
Contact:

DM-EURO conversion

The income in $PEQUIV always refers to that of the previous year; this means that data collected in 2002 for the 2001 income year will still be in DM. There will be a conversion to Euros for all the $PEQUIV information in the next data distribution. Besides that, all the data contained in the $P files corresponds with the information collected with the original questionnaire, i.e. the data collected in Euros in 2002 or the data collected in DM in 2001 is respectively stored in the currency used in the questionnaire.
Contact:

 

nach oben

Data Distribution 2001 (Wave R)

With the 18th wave of the SOEP the concept for the construction of cross-sectional weights has slightly been changed. This change affects neither the derivation of the staying probabilities nor the construction of the weights for Subsample D units.

For more details, please see the Newsletter 60, April 2003.

With the current release of SOEP data (survey years 1984-2001), the coding frame for industry and occupation (first and second job) has been changed to the international standard of NACE and ISCO88, respectively. Especially „old friends" among SOEP-users should be aware that the variables ISCO$$, ISCOU$$, ISCOH$$ and $BRANCHE are no longer available. The respective new variables in the files $P and $PGEN are described in detail in the documentation of the generated variables at person-level (see file pgen.pdf | PDF, 190.51 KB ).

Other than that, the SOEP-group at DIW is currently fixing some minor bugs and deficiencies in the current data release. Firstly, the variables TODJAHR and TODINFO in the file PPFAD, which give year of death and the source of death information, will include all mortality information as given by a recent follow-up study („Verbleibstudie 2001") carried out by Infratest. Secondly, the variable $ERWZEIT in the file $PGEN will be updated so that there is valid information on the number of years with the current employer for all employed respondents in subsample C. Thirdly, the variable RP4002 in the file RP (occupational status: self-employed) and the variables RHHTAGIN, RHHMONIN, RHINTNR in RH (day and month of the interview as well as the interviewer‘s ID) had not been defined properly. All these problems will be fixed with the next release of data. However, users who need to use these variables should subscribe to our listserver so they will receive information about these updates sooner.

nach oben

Data Distribution 2000 (Wave Q)

Rectypes 2000

1. VARIANZ
In addition to the household indicator this file contains the variables STRAT1, STRAT2, SAMPOINT and INTNR. Some software packages (such as STATA, SUDAAN) are able to use these to estimate variances. All four variables provide information on the respective subsample for the start of each first wave, i.e. they are saved at the case-level (variable HHNR).
STRAT1 identifies the levels, which were relevant for pulling the Primary Sampling Units for the respective sample. For subsample B, these were the five nationalities. Therefore, "artificial" levels were created for subsample B corresponding to the other subsamples and filed under STRAT2.
The variable SAMPOINT identifies the respective PSU (e.g. in subsample A voting constituencies, in Subsample D not present).
Due to data protection laws the various values of the variables STRAT1, STRAT2 and SAMPOINT were given transformed values, in order to prevent regional units from being identified.
The variable INTNR is a variable to which every interviewer assigns a number, so that clusters of households that were surveyed by the same interviewer can be identified.

2. HBRUTT00
Similarly to the collection of the supplementary sample 1998 (sample E), this file contains all Brutto information from all households in the Innovation Sample in the year 2000 that were recently surveyed using the Random-Route-Method. In this case, it doesn't matter if these households were successfully surveyed or not. Information such as this can be accessed for the use of methodical investigations through the participation of households in (SOEP) surveys.

3. QJUGEND
In the year 2000, a youth questionnaire was introduced to be used instead of the biography questionnaire. This was aimed at all "new" participants who had reached the minimum age of 16 and were therefore able to take part in the SOEP survey. The 232 data sets that exist as of now supplement the information collected from the likewise first-time answering of the person questionnaire, in order to gain retrospective details on education, as well as basis indicators on education success. A thorough revision, as well as a supplementation of the youth questionnaire indicators took place in 2001, in addition to the fact that the youth participants of sample F took answered this new questionnaire for the first time. As a result, the data set QJUGEND represents, so to speak, a type of pre-test for the recently prepared biography data set BIOYOUTH (available from 2001 onwards).

Reworking of labels  

The VAR LABELS and VALUE LABELS have been be completely reworked for all previous years (up to and including 1999). Missing labels were included where applicable and the systematic was standardised (for instance for sub-items or variables with just one answer category). Furthermore, the labels were made consistent over time. At the same time the reworked label text was transferred to the English labels, so that these too were retrospectively fully identical to the German systematic.  

$PGEN 2000

For the current data distribution, extensive revisions were made to the variables from earlier waves. For instance, note that there are far fewer missing values -1 (k.A.) for many variables related to the occupations. The education variables in all $PGEN were reworked and supplemented. New variables include a differentiated labour force status for all participants and education information generated on the basis of data first collected in the year 2000 which dealt with the highest level of education and employment achieved up till now. The existing generated education variables were retrospectively reworked, extrapolated, as well as supplemented: you will now be able to access data on the temporarily absent respondents, as well as information on current school attendance, apprenticeship or studies. Furthermore the variable BETR$$ in $PGEN was recoded (the data on the size of the firm and therefore the codes in SOEP have changed over time). We would like you to take this into account when updating programs.
Contact: and

$PEQUIV 2000  

The $PEQUIV files were updated. This affects:

  • the extension of the population
  • the reworking of the variable IMPUTED RENT
  • new variables used to generate equivalence scales
  • a reworking of the variables related to ANNUAL WORKING HOURS

Contact:

nach oben

Data Distribution 1999 (Wave P)

Rectype 1999

INTERVIEW
This interviewer data set contains information about sex, age, education, occuaption and marital status from 1048 interviewers that work on sample A, B, C and D from survey wave 1 up to 12 (Documentation | PDF, 75.75 KB ).

For more information concerning the data distributions back to 1995 please refer to our German Site.