SOEP-Core v19 - Changes in the Dataset

Änderungen am Datensatz

Dataset Information

Rectypes 2002

In addition to the continuous, wave-specific brutto information regarding progress in the field (SPBRUTTO, SHBRUTTO), households which were not surveyed have been included in the new subsample G for the file HBRUTT02. HBRUTT02 therefore contains all the households selected for subsample G; while the information on households who were surveyed for subsample G is also to be founding the continuous household-brutto SHBRUTTO. This matches the approach used for samples A (HBRUTT84), E (HBRUTT98) and F (HRBRUTT00).
Contact: Peter Krause

The new data set BIOSOC contains youth information on everybody who has completed the biography questionnaire since 2000. This includes information such as arguments with parents, leisure activities, school grades and the federal state where they last attended school.
Contact: Thorsten Schneider


The data set BIOJOB contains detailed information on first jobs. As of now this also includes ISCO88 data, occupational scales, classification schemes (ISEI, SIOPS, EGP, MPS) as well as information about the sector (BRANCHE). Information regarding last jobs is a new addition and can be found in BIOJOB.
Contact: Thorsten Schneider 


The person to contact for the update of the Prestige-Scores for parents is Jürgen Schupp.

PGEN 2002 

This new variable is based on the answers to 'Occupational Status' and represents the degree of autonomy in a person's occupation.
Contact: Jürgen Schupp

This variable unifies the answers to 'Occupational Status' over all waves.
Contact: Jürgen Schupp

The wave specific files $PGEN have been retroactively (from 1984 onwards) expanded to include two further education variables ($ISCED and $CASMIN), which are respectively based on the international classification schemes ISCED (International Standard Classification of Education) and CASMIN (Comparative Analysis of Social Mobility in Industrial Nations). This will help improve comparisons of education-related analyses based SOEP data.
Contact: Bettina Isengard

$EQUIV 2002 

Compared to the last data set, there have been fundamental changes to the handling of Item-Nonresponse for annually-based income information and the aggregated income information contained in $PEQUIV. The established longitudinal procedure used for the imputation of Item-Nonresponse has been expanded to include a purely cross-sectional imputation for all income variables, which, however, are only to be used in the case of individual longitudinal information being unavailable. This has resulted in a complete replacement of all the missing income data in the $PEQUIV files (for further information for the methodical procedure for additional imputation cf. Frick, J.R. and Grabka, M. (2003): Missing Income Data in the GSOEP: Incidence, Imputation and its Impact on the Income Distribution).

Due to this, all the so-called imputation flags have been revised. This now reproduces the share of the imputated income in the respective income aggregate, i.e. if all information is present the value will be 0 and if any Item-Nonresponses are present then the value may be anything up to 100.

In addition, complete income information for the new sample F for years from 2000 to 2002 now also available.

The CNEF data is not yet available for the first wave of sample G, as the methodically demanding imputation algorithms applied by the SOEP require longitudinal data.
Contact: Markus Grabka

DM-EURO conversion

The income in $PEQUIV always refers to that of the previous year; this means that data collected in 2002 for the 2001 income year will still be in DM. There will be a conversion to Euros for all the $PEQUIV information in the next data distribution. Besides that, all the data contained in the $P files corresponds with the information collected with the original questionnaire, i.e. the data collected in Euros in 2002 or the data collected in DM in 2001 is respectively stored in the currency used in the questionnaire.
Contact: Peter Krause