The German Socio-Economic Panel (SOEP) study is a wide-ranging representative longitudinal study of private households, located at the German Institute for Economic Research, DIW Berlin. Every year, there were nearly 15,000 households, and more than 25,000 persons sampled by the fieldwork organization TNS Infratest Sozialforschung. The data provide information on all household members, consisting of Germans living in the Eastern and Western German States, foreigners, and immigrants to Germany. The Panel was started in 1984. Some of the many topics include household composition, occupational biographies, employment, earnings, health and satisfaction indicators. As early as June 1990—even before the Economic, Social and Monetary Union—SOEP expanded to include the states of the former German Democratic Republic (GDR), thus seizing the rare opportunity to observe the transformation of an entire society. Also immigrant samples were added in 1994/95 and 2013/2015 to account for the changes that took place in Germany society. Two samples of refugees were introduced in 2016. Further new samples were added in 1998, 2000, 2002, 2006, 2009, 2010, 2011, and 2012. The survey is constantly being adapted and developed in response to current social developments. The international version contains 95% of all cases surveyed (see 10.5684/soep.v33i.1).
Title: Socio-Economic Panel (SOEP), data from 1984-2016
Collection period: 1984-2016
Publication date: 2018-01-30
Principal investigators: Jürgen Schupp, Jan Goebel, Martin Kroh, Carsten Schröder, Charlotte Bartels, Klaudia Erhardt, Alexandra Fedorets, Andreas Franken, Marco Giesselmann, Markus Grabka, Peter Krause, Hannes Kröger, Simon Kühne, Maria Metzing, Jana Nebelin, David Richter, Diana Schacht, Paul Schmelzer, Christian Schmitt, Daniel Schnitzlein, Rainer Siegers, Knut Wenzig
You can see the complete information by clicking at the DOI of the original data set https://doi.org/10.5684/soep.v33.
Data set information:
|Number of units||126,151|
|Number of variables||72,709 in 439 data sets|
|Data format||STATA, SPSS, SAS, CSV|
|Distribution format||zip file
|Stata bilingual||dfe399ba3879874dbdd0096b58cbd90f||| TXT, 19.29 KB|
|Stata German||9cbe419645ee17bdb5265df5a5662802||| TXT, 19.29 KB|
|Stata English||3a195c128e21b732d8b1f0ff64316b35||| TXT, 19.29 KB|
|SPSS German||33763b1f68c54f790d9826b4923ac276||| TXT, 19.29 KB|
|SPSS English||0454017269b9f5601d3fe30ace13211f||| TXT, 19.29 KB|
|SAS German||84c5124b696a552340b1d7bca79c8c15||| TXT, 21.53 KB|
|SAS English||e6cb205a9d2abec3a37872f1dbf2a6e8||| TXT, 21.53 KB|
|CSV||df524ba26e46b42ff77dd6991046485d||| TXT, 19.29 KB|
|GGKBOU||1fd60d2f3f1a405d508cf472ff916cc9||| TXT, 140 Byte|
|GGKBOU English||67c43e2e72aab736e6c6dafb75da57f5||| TXT, 140 Byte|
Publications using this file should refer to the above DOI Find an explanation on the usage of DOI here.and cite one of the following references
In the process of preparations for the next wave of the IAB-BAMF-SOEP Survey of Refugees, the survey institute determined that an interviewer had not conducted interviews correctly, affecting six percent of the household interviews in the sample. These households were removed from the dataset, but are available upon request for survey methodological analysis at a guest work station at the SOEP Research Data Center. In addition to deleting these lines of all affected datasets, we also made the following modifications:
Datasets from the current BG wave contained errors in the assignment of interviewer IDs. These were corrected.
Inconsistencies between key variables on population assignment in the PPFAD and $$KIND datasets were corrected. There was an error of one year in the definition of the target population in the $$KIND datasets from 2014 to 2016. In some cases, this led to a lack of information on the year of birth in files on children:
These corrections also affect the number of cases in the file KIDLONG, which was corrected correspondingly.
3.1 Change in the $$NETTO codes in 96 cases (children) in the years 2014-2016
In the process of data checks, the $$NETTO codes in PPFAS were also compared and corrected. In survey years 2014 to 2016, some children had been incorrectly assigned the code 20 instead of 30 on the variable $$NETTO in the PPFAD dataset. This error has been corrected in v33.1 with the correction of the variable $$NETTO. The update also made it necessary to correct person weights in the affected survey years (dataset PHRF), because the determination of which individuals in interviewed households should be assigned a valid weight is based on the variable $$NETTO. The updated weight is also contained in v33.1.
In BIOAPREN, a number of missing values in the flag variables for parental (professional) education and the years of death of the parents were updated and filled in.
The algorithm for imputation of missing dates in the spells were optimized. As a result, in v33.1, the imputed variables and the variables imputed from these were changed, specifically all variables with the suffixes _imp and the variable staytime. The changes affected a total of 349 of 15,640 spells.
The variable AUSB16 (“profession requires vocational training”) from BGPGEN were updated. The correction substantially decreased the number of missings [-1].
1. Dataset $PGEN: Variable casmin$$
A missing parenthesis in programming led to individuals in CASMIN category 6 (“(2c_gen) general maturity certificate”) being mistakenly placed in CASMIN category 7 ("(2c_voc) vocational maturity certificate").
For wave BG, this means that of the 4,553 observations in category 7, 1,976 actually belong in category 6 and 2,577 in category 7.
This can be corrected with the existing variable in the $PGEN data. For wave BG, it can be done as follows:
replace casmin16= 6 if inlist (bgpsbil,3,4) | bgpsbila==3 | bgpsbilo==3
replace casmin16= 7 if (inlist (bgpsbil,3,4) | bgpsbila==3 | bgpsbilo==3) & (inlist (bgpbbila,2,3,5,6,8) | (bgpbbil01>=1 & bgpbbil01<.) | (bgpbbilo>=1 & bgpbbilo<.))
replace casmin16= 8 if inlist (bgpbbil02,1,4)
replace casmin16= 9 if inlist (bgpbbil02,2,3,5,6,7,8) | inlist (bgpbbila,4,7,9)
2. Dataset [BE-BG]PGEN: Variable [be-bg]pbilla ("Vocational Degree Outside Germany")
The variable $$pbilla (foreign degrees – vocational education) in SOEP v33 was expanded retrospectively to include information on whether the degree had been completed. This revision failed, however, to take into account some of the information covered in certain modules. A correction can be made with the existing variables in the $PGEN data, as shown here: Statement | TXT, 2.72 KB
|bepgen||bepbbila||Vocational Degree Outside Germany|
|bfpgen||bfpbbila||Vocational Degree Outside Germany|
|bgpgen||bgpbbila||Vocational Degree Outside Germany|
3. Dataset BIOAGEL: Variable bioage
In the dataset BIOAGEL,the data type was not adjusted for the variable bioage. The variable shows which questionnaire the row of data was taken from. Since the variable bioage has included values > 99 since v33, this led to values > 99 being cut off in Stata. The cut-off values are:
4. Dataset CIRDEF: Variable rgroup
The variable rgroup divides the SOEP sample into 20 equally sized groups. It is used to select the 50% sample. Since the new samples M3 and M4 were incorrectly assigned, there are no cases from these samples in the teaching version of the SOEP data.
Individual (PAPI) 2016: Field-de Field-en Var-de Var-en
Household (PAPI) 2016: Field-de Field-en Var-de Var-en
Biography (PAPI) 2016: Field-de Var-de Var-en
Catch-up Individual 2016: Field-de Var-de Var-en
Youth (16-17-year-olds, A-L1) 2016: Field-de Var-de Var-en
Early Youth (13-14-year-olds) 2016: Field-de
Pre-teen (11-12-year-olds) 2016: Field-de
Early Youth (13-14-year-olds) 2016: Var-de
Pre-teen (11-12-year-olds) 2016: Var-de
Early Youth (13-14-year-olds) 2016: Var-en
Pre-teen (11-12-year-olds) 2016: Var-en
Mother and Child (Newborns) 2016: Field-de Var-de Var-en
Mother and Child (2-3-year-olds) 2016: Field-de Var-de Var-en
Mother and Child (5-6-year-olds) 2016: Field-de Var-de Var-en
Parents and Child (7-8-year-olds) 2016: Field-de Var-de Var-en
Mother and Child (9-10-year-olds) 2016: Field-de Var-de Var-en
Deceased Individual 2016: Field-de Var-de Var-en
Grip Strength 2016: Field-de
Please find all sample specific questionnaires of this year and all questionnaires of previous years on this site
15) Die Vercodung der offenen Angaben zur beruflichen Tätigkeit nach der International Standard Classification of Occupations 2008 (ISCO08) - Direktvercodung - Vorgehensweise und Entscheidungsregeln bei nicht eindeutigen Angaben
All documentation for filtering can be found on this page