SOEP-Core v33.1 - Dataset Information

The German Socio-Economic Panel (SOEP) study is a wide-ranging representative longitudinal study of private households, located at the German Institute for Economic Research, DIW Berlin. Every year, there were nearly 15,000 households, and more than 25,000 persons sampled by the fieldwork organization TNS Infratest Sozialforschung. The data provide information on all household members, consisting of Germans living in the Eastern and Western German States, foreigners, and immigrants to Germany. The Panel was started in 1984. Some of the many topics include household composition, occupational biographies, employment, earnings, health and satisfaction indicators. As early as June 1990—even before the Economic, Social and Monetary Union—SOEP expanded to include the states of the former German Democratic Republic (GDR), thus seizing the rare opportunity to observe the transformation of an entire society. Also immigrant samples were added in 1994/95 and 2013/2015 to account for the changes that took place in Germany society. Two samples of refugees were introduced in 2016. Further new samples were added in 1998, 2000, 2002, 2006, 2009, 2010, 2011, and 2012. The survey is constantly being adapted and developed in response to current social developments. The international version contains 95% of all cases surveyed (see 10.5684/soep.v33i.1).

Dataset Information

Title: Socio-Economic Panel (SOEP), data from 1984-2016

DOI: 10.5684/soep.v33.1
Collection period: 1984-2016
Publication date: 2018-01-30
Principal investigators: Jürgen Schupp, Jan Goebel, Martin Kroh, Carsten Schröder, Charlotte Bartels, Klaudia Erhardt, Alexandra Fedorets, Andreas Franken, Marco Giesselmann, Markus Grabka, Peter Krause, Hannes Kröger, Simon Kühne, Maria Metzing, Jana Nebelin, David Richter, Diana Schacht, Paul Schmelzer, Christian Schmitt, Daniel Schnitzlein, Rainer Siegers, Knut Wenzig

You can see the complete information by clicking at the DOI of the original data set

Data set information:

 Number of units 126,151
 Number of variables 72,709 in 439 data sets
 Data format STATA, SPSS, SAS, CSV

MD5 fingerprints

Distribution format zip file
all files
Stata bilingual dfe399ba3879874dbdd0096b58cbd90f   | TXT, 19.29 KB
Stata German 9cbe419645ee17bdb5265df5a5662802   | TXT, 19.29 KB
Stata English 3a195c128e21b732d8b1f0ff64316b35   | TXT, 19.29 KB
SPSS German 33763b1f68c54f790d9826b4923ac276   | TXT, 19.29 KB
SPSS English 0454017269b9f5601d3fe30ace13211f   | TXT, 19.29 KB
SAS German 84c5124b696a552340b1d7bca79c8c15   | TXT, 21.53 KB
SAS English e6cb205a9d2abec3a37872f1dbf2a6e8   | TXT, 21.53 KB
CSV df524ba26e46b42ff77dd6991046485d   | TXT, 19.29 KB
GGKBOU 1fd60d2f3f1a405d508cf472ff916cc9   | TXT, 140 Byte
GGKBOU English 67c43e2e72aab736e6c6dafb75da57f5   | TXT, 140 Byte
teaching versions
Stata German 3ecf547c653dfac561cb618c306972c8
Stata English 598ba143e4d7115fcc183dd1517af0d1
SPSS German 0f6ffcfcbdf0982afe48582603e20f97
SPSS English 96adf7fef897ddb346253598a9e93242
SAS German 16a66eacf4032b2ba8fe55f5e242bc3f
SAS English d921f61ee31459a4b54ea74d0dda9d10


  • Jan Goebel, Markus M. Grabka, Stefan Liebig, Martin Kroh, David Richter, Carsten Schröder, Jürgen Schupp (2018): The German Socio-Economic Panel Study (SOEP), Jahrbücher für Nationalökonomie und Statistik / Journal of Economics and Statistics (online first) doi: 10.1515/jbnst-2018-0022 (download)
  • Gert G. Wagner, Joachim R. Frick, and Jürgen Schupp (2007) The German Socio-Economic Panel Study (SOEP) - Scope, Evolution and Enhancements, Schmollers Jahrbuch (Journal of Applied Social Science Studies), 127 (1), 139-169 (download).
  • Schupp, Jürgen (2009): 25 Jahre Sozio-oekonomisches Panel - Ein Infrastrukturprojekt der empirischen Sozial- und Wirtschaftsforschung in Deutschland, Zeitschrift für Soziologie 38 (5),  350-357 (download).
  • Gert G. Wagner, Jan Göbel, Peter Krause, Rainer Pischner, and Ingo Sieber (2008) Das Sozio-oekonomische Panel (SOEP): Multidisziplinäres Haushaltspanel und Kohortenstudie für Deutschland - Eine Einführung (für neue Datennutzer) mit einem Ausblick (für erfahrene Anwender), AStA Wirtschafts- und Sozialstatistisches Archiv 2 (4), 301-328 (download).

SOEP-Core soep.v33.1

1 Deletion of incorrectly conducted interviews in the IAB-BAMF-SOEP Survey of Refugees

In the process of preparations for the next wave of the IAB-BAMF-SOEP Survey of Refugees, the survey institute determined that an interviewer had not conducted interviews correctly, affecting six percent of the household interviews in the sample. These households were removed from the dataset, but are available upon request for survey methodological analysis at a guest work station at the SOEP Research Data Center. In addition to deleting these lines of all affected datasets, we also made the following modifications:

  • Due to the deletion of household and individual interviews, the weights had to be updated (dataset HHRF and PHRF) to take the slightly reduced number of cases in the 2016 survey year into account.
  • The new weights were updated or included in the dataset BGPEQUIV.
  • Imputation of monthly household net income (I[1-5]HINC16) was redone for this sample in BGHGEN and in the dataset MIHINC.

2 Update INTID in BG files

Datasets from the current BG wave contained errors in the assignment of interviewer IDs. These were corrected.

3 Corrected number of entries in `$$KIND' (2014-2016)

Inconsistencies between key variables on population assignment in the PPFAD and $$KIND datasets were corrected. There was an error of one year in the definition of the target population in the $$KIND datasets from 2014 to 2016. In some cases, this led to a lack of information on the year of birth in files on children:

    • bekgjahr: 1998 for all samples
    • bfkgjahr: 1999 for all samples
    • bgkgjahr: 1999 only for samples M3 and M4 in 2016

These corrections also affect the number of cases in the file KIDLONG, which was corrected correspondingly.

3.1 Change in the $$NETTO codes in 96 cases (children) in the years 2014-2016

In the process of data checks, the $$NETTO codes in PPFAS were also compared and corrected. In survey years 2014 to 2016, some children had been incorrectly assigned the code 20 instead of 30 on the variable $$NETTO in the PPFAD dataset. This error has been corrected in v33.1 with the correction of the variable $$NETTO. The update also made it necessary to correct person weights in the affected survey years (dataset PHRF), because the determination of which individuals in interviewed households should be assigned a valid weight is based on the variable $$NETTO. The updated weight is also contained in v33.1.


In BIOAPREN, a number of missing values in the flag variables for parental (professional) education and the years of death of the parents were updated and filled in.


The algorithm for imputation of missing dates in the spells were optimized. As a result, in v33.1, the imputed variables and the variables imputed from these were changed, specifically all variables with the suffixes _imp and the variable staytime. The changes affected a total of 349 of 15,640 spells.

6 Update AUSB16 in BGPGEN

The variable AUSB16 (“profession requires vocational training”) from BGPGEN were updated. The correction substantially decreased the number of missings [-1].

1. Dataset $PGEN: Variable casmin$$

A missing parenthesis in programming led to individuals in CASMIN category 6 (“(2c_gen) general maturity certificate”) being mistakenly placed in CASMIN category 7 ("(2c_voc) vocational maturity certificate").

For wave BG, this means that of the 4,553 observations in category 7, 1,976 actually belong in category 6 and 2,577 in category 7.

This can be corrected with the existing variable in the $PGEN data. For wave BG, it can be done as follows:

replace casmin16= 6 if  inlist (bgpsbil,3,4) |  bgpsbila==3 |  bgpsbilo==3   

replace casmin16= 7  if (inlist (bgpsbil,3,4) |  bgpsbila==3 |  bgpsbilo==3)  & (inlist (bgpbbila,2,3,5,6,8) | (bgpbbil01>=1 & bgpbbil01<.) | (bgpbbilo>=1 & bgpbbilo<.))

replace casmin16= 8 if inlist (bgpbbil02,1,4)                                    

replace casmin16= 9 if inlist (bgpbbil02,2,3,5,6,7,8) | inlist (bgpbbila,4,7,9)

2. Dataset [BE-BG]PGEN: Variable [be-bg]pbilla ("Vocational Degree Outside Germany") 

The variable $$pbilla (foreign degrees – vocational education) in SOEP v33 was expanded retrospectively to include information on whether the degree had been completed. This revision failed, however, to take into account some of the information covered in certain modules. A correction can be made with the existing variables in the $PGEN data, as shown here: Statement | TXT, 2.72 KB

Dataset Variabel Variable Label
bepgen bepbbila Vocational Degree Outside Germany
bfpgen bfpbbila Vocational Degree Outside Germany
bgpgen bgpbbila Vocational Degree Outside Germany

3. Dataset BIOAGEL: Variable bioage

In the dataset BIOAGEL,the data type was not adjusted for the variable bioage. The variable shows which questionnaire the row of data was taken from. Since the variable bioage has included values > 99 since v33, this led to values > 99 being cut off in Stata. The cut-off values are:

Variable Value Label
bioage 101 “bioage10a”
bioage 102 “bioage10b(only FID)”

4. Dataset CIRDEF: Variable rgroup

The variable rgroup divides the SOEP sample into 20 equally sized groups. It is used to select the 50% sample. Since the new samples M3 and M4 were incorrectly assigned, there are no cases from these samples in the teaching version of the SOEP data.

Individual (PAPI) 2016: Field-de Var-de Var-en
Household (PAPI) 2016: Field-de Var-de Var-en
Biography (PAPI) 2016: Field-de Var-de Var-en
Catch-up Individual 2016: Field-de Var-de Var-en
Youth (16-17-year-olds, A-L1) 2016: Field-de Var-de Var-en
Early Youth (13-14-year-olds) 2016: Field-de
Pre-teen (11-12-year-olds) 2016: Field-de
Early Youth (13-14-year-olds) 2016: Var-de
Pre-teen (11-12-year-olds) 2016: Var-de
Early Youth (13-14-year-olds) 2016: Var-en
Pre-teen (11-12-year-olds) 2016: Var-en
Mother and Child (Newborns) 2016: Field-de Var-de Var-en
Mother and Child (2-3-year-olds) 2016: Field-de Var-de Var-en
Mother and Child (5-6-year-olds) 2016: Field-de Var-de Var-en
Parents and Child (7-8-year-olds) 2016: Field-de Var-de Var-en
Mother and Child (9-10-year-olds) 2016: Field-de Var-de Var-en
Deceased Individual 2016: Field-de Var-de Var-en
Grip Strength 2016: Field-de

Please find all sample specific questionnaires of this year and all questionnaires of previous years on this site

1) Sampling, Nonresponse, and Integrated Weighting of the 2016 IAB-BAMF-SOEP Survey of Refugees (M3/M4) – revised version

2) SOEP-Core – Documentation of Sample Sizes and Panel Attrition (1984 until 2016)

3) SOEP-Core v33.1 – Biographical Information in the Meta File PPFAD (Month of Birth, Year of Death, Immigration Variables, Living in East or West Germany in 1989)

4) SOEP-Core v33.1 – PPFAD

5) SOEP-Core v33.1 – Documentation of the Household-related Meta-dataset HPFAD

6) SOEP-Core v33.1 – $PBRUTTO

7) SOEP-Core v33.1 – $HBRUTTO

8) SOEP-Core v33.1 – Documentation of Person-related Status and Generated Variables in $PGEN

9) SOEP-Core v33.1 – Documentation of Household-related Status and Generated Variables in $HGEN

10) SOEP 2016 – Codebook for the $PEQUIV File 1984-2016: CNEF Variables with Extended Income Information for the SOEP

11) SOEP-Core v33.1 – BIOIMMIG: Generated Variables for Foreign Nationals, Immigrants, and Their Descendants in the SOEP

12) SOEP-Core v33.1 – HEALTH

13) SOEP-Core v33.1 – BIOPAREN: Biography Information for the Parents of SOEP-Respondents

14) SOEP-Core v33.1 – BIOAGEL: Generated Variables from the “Mother & Child”, “Parent”, and “Pupils” Questionnaires

15) SOEP-Core v33.1 – BIOSIB: Information on Siblings in the SOEP

16) SOEP-Core v33.1 – The Couple History Files BIOCOUPLM and BIOCOUPLY, and Marital History Files BIOMARSM and BIOMARSY

17) SOEP-Core v33.1 – BIOAGE17: The Youth Questionnaire

18) SOEP-Core v33.1 – BIOSOC: Retrospective Data on Youth and Socialization

19) SOEP-Core v33.1 – BIOJOB: Detailed Information on First and Last Job

20) SOEP-Core v33.1 – BIOEDU: Data on Educational Participation and Transitions

21) SOEP-Core v33.1 – BIORESID: Variables on Occupancy and Second Residence

22) SOEP-Core v33.1 – BIOBIRTH: A Data Set on the Birth Biography of Male and Female Respondents

23) SOEP-Core v33.1 – BIOTWIN: TWINS in the SOEP

24) SOEP-Core v33 – INTERVIEWER: Detailed Information on SOEP Interviewers

25) SOEP-Core v33.1 – LIFESPELL: Information on the Pre- and Post-Survey History of SOEP-Respondents

26) SOEP-Core v33.1 – MIGSPELL and REFUGSPELL: The Migration-Biographies of Samples M1/M2 and M3/M4

27) SOEP-Core v33.1 – Activity Biography in the Files PBIOSPE and ARTKALEN

1) Handgreifkraftmessung im Sozio-oekonomischen Panel (SOEP) 2006 und 2008

2) Documentation on ISCED Generation Using the CAMCES Tool in the IAB-SOEP Migration Samples M1/M2

3) The new IAB-SOEP Migration Sample: an introduction into the methodology and the contents

4) The Request for Record Linkage in the IAB-SOEP Migration Sample

5) Flowcharts for the Integrated Individual-Biography Questionnaire of the IAB-SOEP Migration Sample 2013

6) The Measurement of Labor Market Entries with SOEP Data: Introduction to the Variable EINSTIEG_ARTK

7) Job submission instructions for the SOEPremote System at DIW Berlin – Update 2014

8) SOEP 2015 – Informationen zu den SOEP-Geocodes in SOEP v32

9) Editing and Multiple Imputation of Item Non-response in the Wealth Module of the German Socio-Economic Panel

10) Die Vercodung der offenen Angaben zu den Ausbildungsberufen im Sozio-Oekonomischen Panel

11) Das Studiendesign der IAB-BAMF-SOEP Befragung von Geflüchteten

12) Scales Manual IAB-BAMF-SOEP Survey of Refugees in Germany – revised version

13) SOEP 2010 – Preparation of data from the new SOEP consumption module: Editing, imputation, and smoothing

14) SOEP Scales Manual (updated for SOEP-Core v32.1)

15) Kognitionspotenziale Jugendlicher - Ergänzung zum Jugendfragebogen der Längsschnittstudie Sozio-oekonomisches Panel (SOEP)

16) Die Vercodung der offenen Angaben zur beruflichen Tätigkeit nach der International Standard Classification of Occupations 2008 (ISCO08) - Direktvercodung - Vorgehensweise und Entscheidungsregeln bei nicht eindeutigen Angaben

17) Die Vercodung der offenen Angaben zur beruflichen Tätigkeit nach der Klassifikation der Berufe 2010 (KldB 2010): Vorgehensweise und Entscheidungsregeln bei nicht eindeutigen Angaben

18) Multi-Itemskalen im SOEP Jugendfragebogen

19) Zur Erhebung des adaptiven Verhaltens von zwei- und dreijährigen Kindern im Sozio-oekonomischen Panel (SOEP)

20) Documentation of ISCED Generation Based on the CAMCES Tool in the IAB-SOEP Migration Samples M1/M2 and IAB-BAMF-SOEP Survey of Refugees M3/M4 until 2017

21) Missing Income Data in the German SOEP: Incidence, Imputation and its Impact on the Income Distribution

22) SOEP 2013 – Documentation of Generated Person-Level Long-Term Care Variables in PFLEGE

23) SOEP-Core v34 – PFLEGE: Documentation of Generated Person-level Long-term Care Variables

24) SOEP 2006 – TIMEPREF: Dataset on the Economic Behavior Experiment on Time Preferences in the 2006 SOEP Survey

25) SOEP-Core v34: Codebook for the EU-SILC-Like Panel for Germany Based on the SOEP

26) Assessing the distributional impact of "imputed rent" and "non-cash employee income" in microdata : Case studies based on EU-SILC (2004) and SOEP (2002)

Alle Dokumentationen zum Filtern finden Sie auf dieser Seite