DIW Berlin: SOEP-Core v26 (data 1984-2009)

The German Socio-Economic Panel Study (SOEP) is a wide-ranging representative longitudinal study of private households, located at the German Institute for Economic Research, DIW Berlin. Every year, there were nearly 11,000 households, and more than 20,000 persons sampled by the fieldwork organization TNS Infratest Sozialforschung. The data provide information on all household members, consisting of Germans living in the Old and New German States, Foreigners, and recent Immigrants to Germany. The Panel was started in 1984. Some of the many topics include household composition, occupational biographies, employment, earnings, health and satisfaction indicators.
As early as June 1990—even before the Economic, Social and Monetary Union—SOEP expanded to include the states of the former German Democratic Republic (GDR), thus seizing the rare opportunity to observe the transformation of an entire society. An immigrant sample was added as well to account for the changes that took place in Germany society in 1994/95. Further new samples were added in 1998, 2000, 2002, and 2006. The survey is constantly being adapted and developed in response to current social developments.

Dataset Information

Title: German Socio-oeconomic Panel Study (SOEP), data of the years 1984 – 2009

DOI: 10.5684/soep.v26
Collection Period: 1984–2009
Publication Date: 17.09.2010
Principal Investigators: Gert. G. Wagner, Joachim R. Frick, Jürgen Schupp, Silke Anger, Jan Goebel, Markus M. Grabka, Elke Holst, Peter Krause, Martin Kroh, Elisabeth Liebau, Henning Lohmann, Christian Schmitt, C. Katharina Spieß

Data Collector: TNS Infratest Sozialforschung GmbH

Population: Persons living in private households in Germany

Selection method: All samples of SOEP are multi-stage random samples which are regionally clustered. The respondents (households) are selected by random-walk.

Collection mode: The interview methodology of the SOEP is based on a set of pre-tested qustionnaires for households and individuals. Principally an interviewer tries to obtain face-to-face interviews with all members of a given survey household aged 16 years and over. Additionally one person (head of household3) is asked to answer a household related questionnaire covering information on housing, housing costs, and different sources of income. This covers also some questions on children in the household up to 16 years of age, mainly concerning attendance at institutions (kindergarten, elementary school, etc.)

Data set information:

Number of units	66,189
Number of variables	43,155 in 322 data sets
Data formats	STATA, SPSS, SAS, CSV
MD5 fingerprints of every file	Stata German \| TXT, 14.06 KB Stata English \| TXT, 14.06 KB Stata English+German \| TXT, 14.06 KB SPSS German \| TXT, 14.06 KB SPSS English \| TXT, 14.06 KB SPSS portable German \| TXT, 14.06 KB SPSS portable English \| TXT, 14.06 KB SAS German \| TXT, 15.63 KB SAS English \| TXT, 15.63 KB SAS portable German \| TXT, 14.06 KB SAS portable English \| TXT, 14.06 KB

Publications:

Jan Goebel, Markus M. Grabka, Stefan Liebig, Martin Kroh, David Richter, Carsten Schröder, Jürgen Schupp. 2018. The German Socio-Economic Panel Study (SOEP). Jahrbücher für Nationalökonomie und Statistik / Journal of Economics and Statistics (online first), doi: 10.1515/jbnst-2018-0022
Gert G. Wagner, Jan Göbel, Peter Krause, Rainer Pischner, and Ingo Sieber (2008). Das Sozio-oekonomische Panel (SOEP): Multidisziplinäres Haushaltspanel und Kohortenstudie für Deutschland – Eine Einführung (für neue Datennutzer) mit einem Ausblick (für erfahrene Anwender). AStA Wirtschafts- und Sozialstatistisches Archiv 2 (4), 301-328. (download)
Schupp, Jürgen (2009): 25 Jahre Sozio-oekonomisches Panel – Ein Infrastrukturprojekt der empirischen Sozial- und Wirtschaftsforschung in Deutschland, Zeitschrift für Soziologie 38 (5), 350-357.

SOEP-Core - Reference Articles

Publications using this file should refer to the above DOI infoFind an explanation on the usage of DOI here.and cite following references

Goebel, Jan, Markus M. Grabka, Stefan Liebig, Martin Kroh, David Richter, Carsten Schröder, and Jürgen Schupp. 2019. The German Socio-Economic Panel (SOEP). Jahrbücher für Nationalökonomie und Statistik (Journal of Economics and Statistics) 239 (2), 345-360. (https://doi.org/10.1515/jbnst-2018-0022)

If you do not exclude the cases of the migration samples in your analysis, then please also cite the following reference:

Herbert Brücker, Martin Kroh, Simone Bartsch, Jan Goebel, Simon Kühne, Elisabeth Liebau, Parvati Trübswetter, Ingrid Tucci & Jürgen Schupp. 2014. The new IAB-SOEP Migration Sample: an introduction into the methodology and the contents. SOEP Survey Paper 216 (PDF, 444.25 KB), Series C. Berlin, Nürnberg: DIW Berlin.

If you do not exclude the cases of the refugee samples in your analysis, please also cite:

Herbert Brücker, Yuliya Kosyakova, Nina Rother, Sabine Zinn, Elisabeth Liebau, Wenke Gider, Silvia Schwanhäuser, & Manuel Siegert. 2025. Exploring Integration and Migration Dynamics: The Research Potentials of a Large-Scale Longitudinal Household Study of Refugees in Germany. European Sociological Review. https://doi.org/10.1093/esr/jcaf032.

If you use data from the SOEP-LEE2 surveys, please also cite:

Wenzel Matiaske, Torben Dall Schmidt, Christoph Halbmeier, Martina Maas, Doris Holtmann, Carsten Schröder, Tamara Böhm, Stefan Liebig, and Alexander S. Kritikos. 2023. SOEP-LEE2 : Linking Surveys on Employees to Employers in Germany. Jahrbücher für Nationalökonomie und Statistik Data Observer, 1–14. https://doi.org/10.1515/jbnst-2023-0031.

If you would like to refer more specifically, please also cite:

Schröder, Carsten, Johannes König, Alexandra Fedorets, Jan Goebel, Markus M. Grabka, Holger Lüthen, Maria Metzing, Felicitas Schikora, and Stefan Liebig. 2020. The economic research potentials of the German Socio-Economic Panel study. German Economic Review 21 (3), 335-371. (https://doi.org/10.1515/ger-2020-0033)
Giesselmann, Marco, Sandra Bohmann, Jan Goebel, Peter Krause, Elisabeth Liebau, David Richter, Diana Schacht, Carsten Schröder, Jürgen Schupp, and Stefan Liebig. 2019. The Individual in Context(s): Research Potentials of the Socio-Economic Panel Study (SOEP) in Sociology. European Sociological Review 35 (5), 738-755. (https://doi.org/10.1093/esr/jcz029)
Jacobsen, Jannes, Magdalena Krieger, Felicitas Schikora, and Jürgen Schupp. 2021. Growing Potentials for Migration Research using the German Socio-Economic Panel Study. Jahrbücher für Nationalökonomie und Statistik 241 (4), 527-549. (https://doi.org/10.1515/jbnst-2021-0001)
Fedorets, Alexandra, Stefan Kirchner, Jule Adriaans, and Oliver Giering. 2022. Data on Digital Transformation in the German Socio-Economic Panel. Jahrbücher für Nationalökonomie und Statistik 242 (5-6), 691-705. (https://doi.org/10.1515/jbnst-2021-0056)

SOEP-Core v26 - Changes in the Dataset

The 2010 data distribution (data for years 1984-2009) has comprehensive improvements, additions, and modifications. For the most recent survey year 2009, it also provides the usual wave-specific data ZPBRUTTO, ZP, ZPKAL, ZPGEN, ZPAGE17, ZHBRUTTO, ZH, ZHGEN, ZKIND and YPLUECKE as well as the updated files with a longitudinal component (PFAD files, biography files, spell data and weighting factors).

1. Beta verson in 'long format'

The SOEP data are being provided for the first time ever as a beta version in "long format" in addition to the standard data format. SOEPlong refers to a compressed form of the SOEP data: rather than being provided as wave-specific individual files, all available years and cohorts are pooled (long format).

2. New Subsample I
As part of the recent SOEP innovations, fieldwork began in fall 2009 on a new subsample (Sample I). The subsample is currently being used to test the effect of different incentive strategies on participation in the SOEP and it will become part of the innovation sample. See SOEPnewsletter 89 (PDF, 1.37 MB)for more on this new sample.
In four randomly assigned groups, the following strategies were used:

SOEP standard incentives (one lottery ticket per respondent),
Choice of eather a lottery ticket or five euros per individual interview,
Five euros per individual interview,
Ten euros per individual interview.

The data from the new Sample I has been included in the new release of SOEP data (SOEP, v26), but due to the particular features of the subsample, it does not have an integrated weighting framework with the rest of the SOEP samples. For subsample I we are conducting a mail survey of all non-participants in the four groups. Since this is the first wave of subsample I, we were not able to integrate the biographical information from Sample I into the existing biography files. The same applies to the biographical information in the dataset PPFAD, e.g. the variable MIGBACK is completely set to -2.

Apart from that the following additions and modifications have been made:

3. New and Renamed Datasets

Data on cognitive tests (COGDJ)
For the first time, all available data on the cognitive tests of young people ("Denksport Jugend", DJ) are included in the SOEP data release. Since 2006, these tests have been given to young respondents (aged 16) the first time they participate in the SOEP survey.

4. New Variables

4.1 Dataset $HGEN
Two new variables describing the quality of the dwelling:

EQPLIF$$ "Dwelling has an elevator"
EQPNRJ$$ "Dwelling has alternative energy source"

4.2 Dataset $PEQUIV:

There is a new variable on additional child benefits together with the corresponding imputation flag variable (ADCHB$$ and FADCHB$$)

4.3 Dataset $HBRUTTO - Calender Year of Interview

We distribute now a variable( ZDATUMY) describing the calendar year of the interview for the first time. Because of the additional Sample I (and therefore a longer fieldwork period), there were some few cases with a successful interview in 2010.

5. Revised Variables

5.1 Datasets $HGEN

The variables on household type TYP1HH$$ and TYP2HH$$ were completely revised and tested for intertemporal consistency.

5.2 Datasets $KIND – KIDLONG

The variables were also completely revised and are now also provided in longitudinal form (KIDLONG) as well as in cross-sectional form in $KIND. This made it necessary to change the variable names in KIDLONG to be consistent over time.

5.3 Datasets BIOMARSM/BIOMARSY

The biographical data set on marital status was revised.

5.4 Dataset BIOTWIN

The dataset BIOTWIN contains 100 additional cases since wave Z. This considerable increase in case numbers is due to an adjustment in the data generation procedure: In contrast to the previous generation, all siblings with an identical year of birth are consdered twins under the condition that the information on the month of birth remains missing. This less restricitve generation is based on the assumption that two separate births in a single calendar year remain rare occurences. Nevertheless, the number of false positives in this group with a missing month of birth is likely to exceed the BIOTWIN average. Hence a new value label was introduced with the variable INFOTWIN in order to flag these twin groups for the user (Code "6": Coverage since 2007, congruent year of birth, missing month; see in contrast to this code "5": Coverage since 2007, congruent year & month of birth).
In its current state (wave Z) the dataset BIOTWIN covers 250 sets of twins and 5 sets of triplets.

infotwin: 
           [1] Twins - Not in 2006 (gen.) 
            [2] Twins - 2006 (Answer Not Verifiable)
            [3] Twins - 2006 (Answer Refused)
            [4] Twins - 2006 (Answer Validated)
            [5] Twins - since 2007 (gen.)
            [6] Twins - since 2007 (gen.)

5.5 Minor bugs fixed

Correction of MONTH08
Correction of some very few cases on IMMIYEAR
Change in the variable names for questions 25 and 26 in YH and ZH

SOEP-Core v26 - Known Bugs/Fixes

1984-2009 (Wave Z)

Jan. 6, 2011

There was a problem in the assignment of the correct current household number in 3% of the children in the generated longitudinal dataset KIDLONG. The variable HHNRAKT has been corrected accordingly.
In addition, the data in the variable K_NRKID for survey year 1987 have changed for child 397403.
Here, the "number of children in the HH below the age of 16" went from 1 to 2.

Please contact soepmail@diw.de if you use the KIDLONG dataset. We will provide an individualized method of downloading the corrected version for both the 100% dataset for the EEA countries and the 95% version available for use worldwide.

Please note: If you use the corrected dataset KIDLONG we recommend citing it as follows:
English:
Socio-Economic Panel (SOEP), data for years 1984-2009, version 26.1, SOEP, 2011.
German:
Sozio-oekonomisches Panel (SOEP), Daten für die Jahre 1984-2009, Version 26.1, SOEP, 2011.
Short Version:
SOEP v26.1.

Survey Instruments

Survey Instruments 2009: Field-de

Please find all sample specific questionnaires of this year and all questionnaires of previous years on this site

Documentation of the datasets

1) SOEP 2009 – Methodenbericht zum Befragungsjahr 2009 (Welle 26) des Sozio-oekonomischen Panels

Documentation

1) Zur Erhebung des adaptiven Verhaltens von zwei- und dreijährigen Kindern im Sozio-oekonomischen Panel (SOEP)

2) Assessing the distributional impact of "imputed rent" and "non-cash employee income" in microdata : Case studies based on EU-SILC (2004) and SOEP (2002)

All documentation for filtering can be found on this page

Report on survey methods

1) SOEP 2009 – Methodenbericht zum Befragungsjahr 2009 (Welle 26) des Sozio-oekonomischen Panels

SOEP-Core

SOEP-IS

Further Offer

Dataset Information

SOEP-Core - Reference Articles keyboard_arrow_up

SOEP-Core v26 - Changes in the Dataset keyboard_arrow_up

SOEP-Core v26 - Known Bugs/Fixes keyboard_arrow_up

Survey Instruments keyboard_arrow_up

Documentation of the datasets keyboard_arrow_up

Documentation keyboard_arrow_up

Report on survey methods keyboard_arrow_up