SOEP-Core v35 (2018) - Dataset Information





Erhebungsjahr/ Welle


The German Socio-Economic Panel (SOEP) study is a wide-ranging annual representative longitudinal study of private households, located at the German Institute for Economic Research, DIW Berlin. In 2018 more than 33,000 individuals in almost 20,000 households have been interviewed by the fieldwork organization Kantar Public. The data provide information on all household members, consisting of Germans living in the Eastern and Western German States, foreigners, and immigrants to Germany. The Panel was started in 1984. Some of the many topics include household composition, occupational biographies, employment, earnings, health and satisfaction indicators. As early as June 1990—even before the Economic, Social and Monetary Union—SOEP expanded to include the states of the former German Democratic Republic (GDR), thus seizing the rare opportunity to observe the transformation of an entire society. Also immigrant samples were added in 1994/95 and 2013/2015 to account for the changes that took place in Germany society. Two samples of refugees were introduced in 2016, another one in 2017. Further new samples were added in 1998, 2000, 2002, 2006, 2009, 2010, 2011, 2012, 2017, and 2018. The survey is constantly being adapted and developed in response to current social developments. The international version contains 95% of all cases surveyed (see 10.5684/soep.v35i).


Title: Socio-Economic Panel (SOEP), data from 1984-2018

DOI: 10.5684/soep-core.v35
Collection period: 1984-2018
Publication date: 01.11.2019
Principal investigators: Stefan Liebig, Jan Goebel, Martin Kroh, Carsten Schröder, Markus Grabka, Jürgen Schupp, Charlotte Bartels, Alexandra Fedorets, Andreas Franken, Jannes Jacobsen, Selin Kara, Peter Krause, Hannes Kröger, Maria Metzing, David Richter, Diana Schacht, Paul Schmelzer, Christian Schmitt, Daniel Schnitzlein, Rainer Siegers, Knut Wenzig, Stefan Zimmermann

Contributor: Kantar Deutschland GmbH (Data Collector)

Population: Persons living in private households in Germany

Sampling: All samples of SOEP are multi-stage random samples which are regionally clustered. The respondents (households) are selected by random-walkor register sample.

Collection Mode:The interview methodology of the SOEP is based on a set of pre-tested questionnaires for households and individuals. Principally an interviewer tries to obtain face-to-face interviews with all members of a given survey household aged 16 years and over. Additionally one person (head of household) is asked to answer a household related questionnaire covering information on housing, housing costs, and different sources of income. This covers also some questions on children in the household up to 16 years of age, mainly concerning attendance at institutions (kindergarten, elementary school, e


  • Jan Goebel, Markus M. Grabka, Stefan Liebig, Martin Kroh, David Richter, Carsten Schröder, Jürgen Schupp (2019): The German Socio-Economic Panel Study (SOEP), Jahrbücher für Nationalökonomie und Statistik / Journal of Economics and Statistics 239 (2), 345-360 (download)
  • Schupp, Jürgen (2009): 25 Jahre Sozio-oekonomisches Panel - Ein Infrastrukturprojekt der empirischen Sozial- und Wirtschaftsforschung in Deutschland, Zeitschrift für Soziologie 38 (5),  350-357 (download).
  • Gert G. Wagner, Jan Göbel, Peter Krause, Rainer Pischner, and Ingo Sieber (2008) Das Sozio-oekonomische Panel (SOEP): Multidisziplinäres Haushaltspanel und Kohortenstudie für Deutschland - Eine Einführung (für neue Datennutzer) mit einem Ausblick (für erfahrene Anwender), AStA Wirtschafts- und Sozialstatistisches Archiv 2 (4), 301-328 (download).
  • Gert G. Wagner, Joachim R. Frick, and Jürgen Schupp (2007) The German Socio-Economic Panel Study (SOEP) - Scope, Evolution and Enhancements, Schmollers Jahrbuch (Journal of Applied Social Science Studies), 127 (1), 139-169 (download).

For the SOEP-Core data 1984-2018 (v35) - Wave A to BI - we provide the following versions:


soep-core.v35i (International Scientific Use Version, 95%)

soep-core.v35t (Teaching version)

These datasets are included in SOEP v35, but is also available as individual data sets upon reques:

soep.ddr18 (Living in the GDR)

soep.iab-soep-mig.2018 (Migration samples)

soep.iab-bamf-soep-mig.2018 (Refugee samples)

SOEP-Core soep-core.v35

1. New sample in the main SOEP study

The new refresher sample, Subsample O, contains 1,000 new households. These were selected in cooperation with BBSR using a new sampling design based on regional data in areas where the “Soziale Stadt” (social city) urban development project is being carried out. Based on the digital data available on the boundaries of the “Soziale Stadt” areas, it was possible to create a new variable going back to the year 2000 that shows whether or not a household’s address is within an area covered by the project (see Variable Description below under 4.4).

2. Modifications in our new main data format, SOEPlong

We have made the following important changes over and beyond to our normal annual updates:

  • PKAL: Integration of the $PKALOST datasets
  • PL/PKAL: Calendar strings all now stored in PL and monthly variables in PKAL
  • PLUECKEL: Introduction of RYEAR and correction of SYEAR, which was RYEAR up to now
  • PBRUTTO: If a variable was not part of the year-specific gross file, the missing code has been changed to -8 and is no longer -2.
  • VPL: The case numbers for past years have increased since cases without a SOEP respondent are no longer deleted
  • KIDLONG: The harmonization concept has been adapted to the concept used with other datasets; more variables from $KIND datasets have been included (more information under 4)

3. New in SOEPhelp

  • SOEPhelp now includes links between topics and variables from the metadata. The data overview (command: soephelp (without variable)) lists all the topics in the dataset and tells which variables belong to which topic.
  • The variable overview (command: soephelp [variable]) lists the topics covered by the variable (and the relationships among topics and subtopics). The topic labels are linked to
  • SOEPhelp now has a search tool! If you type in the command: soephelp, search (SEARCH TERM) [verbose], you will get a list of the variables for which your SEARCH TERM is contained either in the question or one of the answer options. The variables are provided in list form and saved in r (for returns). The option “verbose” describes the variables in more detail.
  • More information on SOEPhelp

4. New Datasets and Variables

4.1. Early Childhood

  • New dataset BCBFK “Early Childhood" with geographically detailed information about the places where the respondents grew up. Because of the detailed regional data the dataset is only available with the RDC SOEP. The corresponding field report and questionnaire is available as SOEP Survey Paper 766 (PDF, 1.28 MB) (in German).

4.2. Your Life in the GDR

  • New dataset DDR18 “Your Life in the GDR”, the corresponding questionnaire is available as Survey Paper 676.

4.3. Biography follow-up survey

  • The variables from the biography follow-up survey on migration status have been integrated into the dataset BILELA or BIOL.

4.4. New variable SOCURBAN in dataset HBRUTTO

  • SOCURBAN: Household address is in an area where the “Soziale Stadt” (social city) urban development project is being carried out (as of July 2017) (Yes/No)

4.5. New variables in dataset EQUIV

  • ILIB1$$: pensions for liberal professions
  • ILIB2$$: widow / orphans pensions for liberal professions

4.6. New variables in dataset BIOJOB

  • In 2018, respondents received new survey instruments concerning job classifications and prestige score. This information is provided in new following variables: STBA10, ISCO08, EGP08, ISEI08, MPS08, and SIOPS08. Corresponding variables STBA, EGP, ISEI, MPS, und SIOPS of older versions of BIOJOB are renamed in STBA92, EGP88, ISEI88, MPS92, und SIOPS88.

5. Changes to datasets or individual variables

5.1. Weighting variable PHRF in the dataset PPATHL

  • There are slight changes concerning the poststratification of the weighting variables starting in 2013. The changes relate to the year of immigration. Previously, respondents who immigrated before 1955 were treated as migrants; they now constitute a distinct category of their own, along with recent immigrants and German-born respondents. The reason is that it is not possible to define ethnic Germans consistently between the Mikrozensus and the SOEP.

5.2. Variables representing occupational codes

  • Since 2013, open-ended questions on occupations have been coded in ISCO-08 and KldB 2010. This is the first year in which the old classifications ISCO-88 and KldB 92 are no longer available. We have therefore introduced new prestige scores based on the new classifications and discontinued the old scores.
  • Calendar strings have been moved from $PKAL to $P or standardized.

5.3. Educational variables

  • Up to soep.v34, the basic generated educational variables were generated annually and were cumulated over time. Due to the availability of SOEPlong, we have substantially revised the tools used for generating variables to always consider all available educational variables for each year.
  • In addition to the fact that all variables are now generated based entirely on SOEPlong files, we have also made two additional modifications:
  • First, the main educational variables now also take into account inconsistencies over time, in contrast to the educational variables in PGEN prior to soep.v34.
  • Second, variable “Amount of Education or Training in Years” ($$BILZEIT) has been slightly modified. To consider occupational training (for non-university degrees), we have adjusted the years of education for “civil servants” and “others” slightly.

5.4. Dataset KIDLONG

  • Errors in the integration of variables were corrected, split up in versioned variables, and harmonized variables were constructed. As a result, the number of variables has increased: 110 variables (v.34); 267 variables (v.35)
  • Missing variables from the $KIND datasets were incorporated into KIDLONG.
  • Corrected version of BHKIND was incorporated into KIDLONG.
  • KIDLONG now adheres to the classic harmonization concept).

5.5. Dataset BHKIND

  • Flag variable to identify child questionnaires that were not completed (BHKFLAG)
  • Missing observations were added: 15,032 (v.34) to 15,504 (v.35).
  • Errors in the integration of variables were corrected and missing variables were incorporated into BHKIND: 85 (v.34) variables; 129 variables (v.35).
  • All variables were renamed and now follow the SOEP naming conventions.

5.6. Dataset BIKIND

  • Flag variable added to identify child questionnaires that were not completed (BIKFLAG)
  • All variables now follow the SOEP naming conventions.

5.7. Variable PARID in the dataset PPATHL

  • Partnerships of respondents with net codes between 40 and 49 were dissolved and will be coded -2 “does not apply” in the future.

5.8. Variable HGOWNER in the dataset HGEN

  • In samples M3-M5 in 2017, several missing values in the variable HGOWNER were replaced with the information that a household is living in a shelter or housing for refugees.

5.9. Dataset INTERVIEWER

  • The year 2016 now contains information from Samples L2-M4.
  • The variable on the length of the interview (LENGTHINT) was eliminated and replaced by three variables, which each just give the average length of one questionnaire (LENGTHINT- H / P / J).
  • The youth surveys, which were previously counted in the number of interviews per person (AMOUNTINTP) now have their own variable (AMOUNTINTJ).

5.10. Dataset BIOAGE17

  • Previous versions of BIOAGE17 contained the identifier of the respondent’s mother (BYMNR) and father (BYVNR). The identifiers of the parents are found in BIOPAREN (MNR and VNR) and can be easily merged with BIOAGE17.
  • Desired occupation variables ISCO88 have been replaced by ISCO08. The same is true for BYKLAS: The old 1992 version has been replaced by the 2010 version.

5.11. Dataset BIOAGEL

  • The internal distinction between BIOAGE 8a and 8b, or between 81 and 82, has been eliminated, meaning that the dataset BIOAGEL now contains one line per child and respondent for questionnaires about 7-8-year-old children. As a result, when each parent completed a questionnaire on a child in a given year, there are two lines for that child (one line per parent). These can be identified by the different PIDE (PID of the respondent).

Individual: Field-de,en Var-de Var-en
Household: Field-de,en Var-en Var-en
Biography: Field-de,en Var-de Var-en
Catch-up Individual: Field-de,en
Youth (16-17-year-olds): Field-de,en Var-de Var-en
Pre-teen (11-12-year-olds): Field-de,en
Mother and Child (Newborns): Field-de,en
Mother and Child (2-3-year-olds): Field-de,en
Mother and Child (5-6-year-olds): Field-de,en
Interviewer: Var-en Var-en
Parents and Child (7-8-year-olds): Field-de,en
Mother and Child (9-10-year-olds): Field-de,en
Deceased Individual: Field-de,en
Grip Strength: Field-de,en
Life in the former GDR: Field-de Var-de Var-en

Alle Sample-spezifischen Fragebögen dieses Jahres und alle Fragebögen der vorherigen Befragungsjahre finden Sie auf dieser Seite