Skip to content!

SOEP-Core v25 - Changes in the Dataset

Änderungen am Datensatz

Dataset Information

The new dataset (Waves 1-25, 1984-2008) contains extensive improvements, additions, and modifications. Besides the usual wave-specific data YPRUTTO, YP, YPKAL, YPGEN, YHBRUTTO, YH, YHGEN, YKIND, and XPLUECKE as well as the updated files with a longitudinal component (PFAD files, biography files, spell data, and weighting factors).
We now also provide-in a beta-release-the data in a more user-friendly format called "SOEPlong". We announced this in SOEPnewsletter 80/2008 and thank all those who provided input on this issue. The new and preliminary version of the SOEP data in long format can be obtained upon request. We suggest that only "power users" should order this version of the data who would like to work with us to improve data management. This version contains all data and thus can essentially already be used for final analyses. This is a preliminary version. We do not recommend the new format for inexperienced users. If new SOEP users want to work with the new format, they should at least be familiar with other panel datasets.

The most important improvements in the new data istribution are listed in the following:

1. New Datasets

1.1 Dataset BIOAGE06
The new data distribution contains the new file BIOAGE06. For the first time in 2008, it includes the information collected using the special motherchild questionnaire, usually from mothers of fiveto- six-year-old pre-schoolers. The data are thus on a birth cohort that was first “surveyed” in the year 2002/2003 with a special Newborn Questionnaire. The new data on pre-school-age children contains children’s height and weight, health, care situation, activities with and without the mother, and media usage. Detailed questions address the care situation. Furthermore, valid information on the child’s personality is collected for the first time (based on the “Big Five” personal traits indicator in the main questionnaire for adults) and their socio-emotional behavior (surveyed with a modified version of the Strength and Difficulties Questionnaire).

1.2 Dataset MOVEDIST
We provide a new dataset on the change of residence. Based on the geo coordinates at block level we will provide the information about the distance (in meter) between the former and the present residence. However the information will only be available for moves since 2000 and is NOT available on this DVD! We distribute this data together with data on the spatial planning regions (ROR) on an extra CD-ROM. You need an extended data distribution contract including a data protection concept if you want to use this kind of data. After signing your contract extension, you will receive this data on CD-ROM (at no additional cost).

2 New Variables 

2.1 Dataset PPFAD

  • MIGBACK / MIGINFO: MIGBACK provides time-invariant information on an individual’s migration background resulting from own and parental data. MIGINFO indicates the sources of the information used in order to provide users with highest possible transparency. A detailed description is available in the extensive biography documentation (see chapter on PPFAD).


2.2 Dataset PFLEGE

  • Pay / Stufe: two new variables about paid care (PAY) and the care level (STUFE) accordingly to the German compulsory long term care insurance.


2.3 Dataset PBIOSPE

The data generation process has been updated completely but without changing the basic principles. Therefore, there are only a few barely discernible deviations in the main variables (due to slight changes in the consistency checks of the data). But there are a number of visible changes in the form of additional variables or additional values in already existing variables. A detailed description is available in our documentation on biography and life history data.

2.4 Dataset BIOPAREN

  • BIO: origin of information is $LELA or $JUGEND
  • ALTER / VALTER/ MALTER: age of respondent/ father / mother -all at the time of biography interview.
  • Attention: A bug had been discovered in the dataset shortly after completing the DVD. For updating the information about the parental religious affiliation, please see our site Known Bugs/Fixes.


3 Revised Variables

3.1 Dataset PWEALTH and HWEALTH
In the year 2007, all individuals aged 17 and up were again surveyed on wealth, just as they were for the first time in 2002. These “raw” data were already part of the standard data distribution for Wave 24 and will be distributed with the upcoming data distribution in a file containing the data for 2002 and 2007 in “long format” – the file PWEALTH for individual data, HWEALTH with data aggregated according to household context. Missing values due to item or partial unit non-response (e.g., missing interviews with individual household members in interviewed households) will be subjected to multiple imputations in complex procedures taking longitudinal information into account. Documentation on this is under preparation. An initial analysis of the new wealth data for 2002 and 2007 is provided in: Joachim R. Frick und Markus M. Grabka. 2009. Wealth Inequality on the Rise in Germany. Weekly Report 5 (10), 62-73.

3.2 Dataset $PEQUIV


3.3 Dataset HHRF/PHRF

  • $HHRF / $PHRF: We provide an update and revision of the post-stratification scheme and an additional regional calibration of SOEP weights.
    Short documentation


3.4 Dataset $PGEN

  • EMPLST$$: A new category has been added to this variable ("Employment status") From 1998 on, the SOEP data contains information on working in a sheltered workshop for the disabled. Since these persons do not provide information on whether they work full-time, parttime, or on an irregular basis, the new category "sheltered workshop" has been included.


3.5 Dataset $HGEN
The domicile-related variables in the wave-specific $HGEN files have been completely revised. New additions include the full imputation of missing values (due to item-non-response) for the housingrelated variables number of rooms, heating costs, gross rent excluding heating, as well as the newly generated variable on utility costs in addition to rent. Finally, “flag variables” show the imputation status, if relevant. Experienced SOEP users may also note the change of the various variable names in the file $HGEN.


3.6 Dataset PPFAD

  • TODJAHR / TODINFO: To separate panel mortality from demographic reasons for dropping out from the SOEP sample, TNS Infratest carried out several studies to determine the current residence of panel dropouts, i.e. earlier respondents who no longer take part in the SOEP. This entailed locating 17,195 persons. These investigations allowed 981 cases to be identified in which the dropout had died. However, until 2008 all in all 3791 deaths have been identified in the SOEP (see also the documentation on the variables TODJAHR and TODINFO in the file PPFAD). Additionally, there exists a documentation in German language from our fieldwork organization TNS Infratest (“Wiederbefragung von Panelausfällen” and an English language summary).
keyboard_arrow_up