Title: Socio-Economic Panel (SOEP), data from 1984-2016

DOI: 10.5684/soep.v33.1
Collection period: 1984-2016
Publication date: 2018-01-30
Principal investigators: Jürgen Schupp, Jan Goebel, Martin Kroh, Carsten Schröder, Charlotte Bartels, Klaudia Erhardt, Alexandra Fedorets, Andreas Franken, Marco Giesselmann, Markus Grabka, Peter Krause, Hannes Kröger, Simon Kühne, Maria Metzing, Jana Nebelin, David Richter, Diana Schacht, Paul Schmelzer, Christian Schmitt, Daniel Schnitzlein, Rainer Siegers, Knut Wenzig

Data set information:

 Number of units 126,151
 Number of variables 72,709 in 439 data sets
 Data format STATA, SPSS, SAS, CSV

  • Gert G. Wagner, Joachim R. Frick, and Jürgen Schupp (2007) The German Socio-Economic Panel Study (SOEP) - Scope, Evolution and Enhancements, Schmollers Jahrbuch (Journal of Applied Social Science Studies), 127 (1), 139-169 (download).
  • Schupp, Jürgen (2009): 25 Jahre Sozio-oekonomisches Panel - Ein Infrastrukturprojekt der empirischen Sozial- und Wirtschaftsforschung in Deutschland, Zeitschrift für Soziologie 38 (5),  350-357 (download).
  • Gert G. Wagner, Jan Göbel, Peter Krause, Rainer Pischner, and Ingo Sieber (2008) Das Sozio-oekonomische Panel (SOEP): Multidisziplinäres Haushaltspanel und Kohortenstudie für Deutschland - Eine Einführung (für neue Datennutzer) mit einem Ausblick (für erfahrene Anwender), AStA Wirtschafts- und Sozialstatistisches Archiv 2 (4), 301-328 (download).

Update information

Please note also the known issues of this version and fixes  here.

1 Deletion of incorrectly conducted interviews in the IAB-BAMF-SOEP Survey of Refugees

In the process of preparations for the next wave of the IAB-BAMF-SOEP Survey of Refugees, the survey institute determined that an interviewer had not conducted interviews correctly, affecting six percent of the household interviews in the sample. These households were removed from the dataset, but are available upon request for survey methodological analysis at a guest work station at the SOEP Research Data Center. In addition to deleting these lines of all affected datasets, we also made the following modifications:

  • Due to the deletion of household and individual interviews, the weights had to be updated (dataset HHRF and PHRF) to take the slightly reduced number of cases in the 2016 survey year into account.
  • The new weights were updated or included in the dataset BGPEQUIV.
  • Imputation of monthly household net income (I[1-5]HINC16) was redone for this sample in BGHGEN and in the dataset MIHINC.

2 Update INTID in BG files

Datasets from the current BG wave contained errors in the assignment of interviewer IDs. These were corrected.

3 Corrected number of entries in `$$KIND' (2014-2016)

Inconsistencies between key variables on population assignment in the PPFAD and $$KIND datasets were corrected. There was an error of one year in the definition of the target population in the $$KIND datasets from 2014 to 2016. In some cases, this led to a lack of information on the year of birth in files on children:

    • bekgjahr: 1998 for all samples
    • bfkgjahr: 1999 for all samples
    • bgkgjahr: 1999 only for samples M3 and M4 in 2016

These corrections also affect the number of cases in the file KIDLONG, which was corrected correspondingly.

3.1 Change in the $$NETTO codes in 96 cases (children) in the years 2014-2016

In the process of data checks, the $$NETTO codes in PPFAS were also compared and corrected. In survey years 2014 to 2016, some children had been incorrectly assigned the code 20 instead of 30 on the variable $$NETTO in the PPFAD dataset. This error has been corrected in v33.1 with the correction of the variable $$NETTO. The update also made it necessary to correct person weights in the affected survey years (dataset PHRF), because the determination of which individuals in interviewed households should be assigned a valid weight is based on the variable $$NETTO. The updated weight is also contained in v33.1.


In BIOAPREN, a number of missing values in the flag variables for parental (professional) education and the years of death of the parents were updated and filled in.


The algorithm for imputation of missing dates in the spells were optimized. As a result, in v33.1, the imputed variables and the variables imputed from these were changed, specifically all variables with the suffixes _imp and the variable staytime. The changes affected a total of 349 of 15,640 spells.

6 Update AUSB16 in BGPGEN

The variable AUSB16 (“profession requires vocational training”) from BGPGEN were updated. The correction substantially decreased the number of missings [-1].

Please note also the known issues of this version and fixes  here.


The SOEP micro data which we make available for scientific research can only be interpreted using statistical software. Direct use of SOEP data is subject to the high standards for lawful data protection in the Federal Republic of Germany. Signing a contract on data distribution with the DIW Berlin is therefore a precondition for working with SOEP data. After signing the contract, the data of every new wave will be available on request. 
