The German Socio-Economic Panel Study (SOEP) is a wide-ranging representative longitudinal study of private households, located at the German Institute for Economic Research, DIW Berlin. Every year, there were nearly 11,000 households, and more than 20,000 persons sampled by the fieldwork organization TNS Infratest Sozialforschung. The data provide information on all household members, consisting of Germans living in the Old and New German States, Foreigners, and recent Immigrants to Germany. The Panel was started in 1984. Some of the many topics include household composition, occupational biographies, employment, earnings, health and satisfaction indicators.
As early as June 1990-even before the Economic, Social and Monetary Union-SOEP expanded to include the states of the former German Democratic Republic (GDR), thus seizing the rare opportunity to observe the transformation of an entire society. An immigrant sample was added as well to account for the changes that took place in Germany society in 1994/95. Further new samples were added in 1998, 2000, 2002, and 2006. The survey is constantly being adapted and developed in response to current social developments.

Title: German Sozio-oekonomisches Panel Study (SOEP), data of the years 1984 – 2008

DOI: 10.5684/soep.v25
Collection period: 1984–2008
Publication date: 27.10.2009
Principal investigators: Gert. G. Wagner, Joachim R. Frick, Jürgen Schupp, Silke Anger, Jan Goebel, Markus M. Grabka, Elke Holst, Peter Krause, Martin Kroh, Henning Lohmann, Christian Schmitt, C. Katharina Spieß

Data collector: TNS Infratest Sozialforschung GmbH

Population: Persons living in private households in Germany

Selection method: All samples of SOEP are multi-stage random samples which are regionally clustered. The respondents (households) are selected by random-walk.

Collection mode: The interview methodology of the SOEP is based on a set of pre-tested qustionnaires for households and individuals. Principally an interviewer tries to obtain face-to-face interviews with all members of a given survey household aged 16 years and over. Additionally one person (head of household) is asked to answer a household related questionnaire covering information on housing, housing costs, and different sources of income. This covers also some questions on children in the household up to 16 years of age, mainly concerning attendance at institutions (kindergarten, elementary school, etc.)

Number of units 62,101
Number of variables 41.348 in 309 datasets
  • Jan Goebel, Markus M. Grabka, Stefan Liebig, Martin Kroh, David Richter, Carsten Schröder, Jürgen Schupp. 2018. The German Socio-Economic Panel Study (SOEP). Jahrbücher für Nationalökonomie und Statistik / Journal of Economics and Statistics (online first), doi: 10.1515/jbnst-2018-0022
  • Gert G. Wagner, Jan Göbel, Peter Krause, Rainer Pischner, and Ingo Sieber. 2008. Das Sozio-oekonomische Panel (SOEP): Multidisziplinäres Haushaltspanel und Kohortenstudie für Deutschland - Eine Einführung (für neue Datennutzer) mit einem Ausblick (für erfahrene Anwender), AStA Wirtschafts- und Sozialstatistisches Archiv 2 (4), 301-328. (download)

  • Goebel, Jan, Markus M. Grabka, Stefan Liebig, Martin Kroh, David Richter, Carsten Schröder, and Jürgen Schupp. 2019. The German Socio-Economic Panel (SOEP). Jahrbücher für Nationalökonomie und Statistik (Journal of Economics and Statistics) 239 (2), 345-360. (
  • Schröder, Carsten, Johannes König, Alexandra Fedorets, Jan Goebel, Markus M. Grabka, Holger Lüthen, Maria Metzing, Felicitas Schikora, and Stefan Liebig. 2020. The economic research potentials of the German Socio-Economic Panel study. German Economic Review 21 (3), 335-371. (
  • Giesselmann, Marco, Sandra Bohmann, Jan Goebel, Peter Krause, Elisabeth Liebau, David Richter, Diana Schacht, Carsten Schröder, Jürgen Schupp, and Stefan Liebig. 2019. The Individual in Context(s): Research Potentials of the Socio-Economic Panel Study (SOEP) in Sociology. European Sociological Review 35 (5), 738-755. (

The new dataset (Waves 1-25, 1984-2008) contains extensive improvements, additions, and modifications. Besides the usual wave-specific data YPRUTTO, YP, YPKAL, YPGEN, YHBRUTTO, YH, YHGEN, YKIND, and XPLUECKE as well as the updated files with a longitudinal component (PFAD files, biography files, spell data, and weighting factors).
We now also provide-in a beta-release-the data in a more user-friendly format called "SOEPlong". We announced this in SOEPnewsletter 80/2008 and thank all those who provided input on this issue. The new and preliminary version of the SOEP data in long format can be obtained upon request. We suggest that only "power users" should order this version of the data who would like to work with us to improve data management. This version contains all data and thus can essentially already be used for final analyses. This is a preliminary version. We do not recommend the new format for inexperienced users. If new SOEP users want to work with the new format, they should at least be familiar with other panel datasets.

The most important improvements in the new data istribution are listed in the following:

1. New Datasets

1.1 Dataset BIOAGE06
The new data distribution contains the new file BIOAGE06. For the first time in 2008, it includes the information collected using the special motherchild questionnaire, usually from mothers of fiveto- six-year-old pre-schoolers. The data are thus on a birth cohort that was first “surveyed” in the year 2002/2003 with a special Newborn Questionnaire. The new data on pre-school-age children contains children’s height and weight, health, care situation, activities with and without the mother, and media usage. Detailed questions address the care situation. Furthermore, valid information on the child’s personality is collected for the first time (based on the “Big Five” personal traits indicator in the main questionnaire for adults) and their socio-emotional behavior (surveyed with a modified version of the Strength and Difficulties Questionnaire).

1.2 Dataset MOVEDIST
We provide a new dataset on the change of residence. Based on the geo coordinates at block level we will provide the information about the distance (in meter) between the former and the present residence. However the information will only be available for moves since 2000 and is NOT available on this DVD! We distribute this data together with data on the spatial planning regions (ROR) on an extra CD-ROM. You need an extended data distribution contract including a data protection concept if you want to use this kind of data. After signing your contract extension, you will receive this data on CD-ROM (at no additional cost).

2 New Variables 

2.1 Dataset PPFAD

  • MIGBACK / MIGINFO: MIGBACK provides time-invariant information on an individual’s migration background resulting from own and parental data. MIGINFO indicates the sources of the information used in order to provide users with highest possible transparency. A detailed description is available in the extensive biography documentation (see chapter on PPFAD).

2.2 Dataset PFLEGE

  • Pay / Stufe: two new variables about paid care (PAY) and the care level (STUFE) accordingly to the German compulsory long term care insurance.

2.3 Dataset PBIOSPE

The data generation process has been updated completely but without changing the basic principles. Therefore, there are only a few barely discernible deviations in the main variables (due to slight changes in the consistency checks of the data). But there are a number of visible changes in the form of additional variables or additional values in already existing variables. A detailed description is available in our documentation on biography and life history data.

2.4 Dataset BIOPAREN

  • BIO: origin of information is $LELA or $JUGEND
  • ALTER / VALTER/ MALTER: age of respondent/ father / mother -all at the time of biography interview.
  • Attention: A bug had been discovered in the dataset shortly after completing the DVD. For updating the information about the parental religious affiliation, please see our site Known Bugs/Fixes.

3 Revised Variables

3.1 Dataset PWEALTH and HWEALTH
In the year 2007, all individuals aged 17 and up were again surveyed on wealth, just as they were for the first time in 2002. These “raw” data were already part of the standard data distribution for Wave 24 and will be distributed with the upcoming data distribution in a file containing the data for 2002 and 2007 in “long format” – the file PWEALTH for individual data, HWEALTH with data aggregated according to household context. Missing values due to item or partial unit non-response (e.g., missing interviews with individual household members in interviewed households) will be subjected to multiple imputations in complex procedures taking longitudinal information into account. Documentation on this is under preparation. An initial analysis of the new wealth data for 2002 and 2007 is provided in: Joachim R. Frick und Markus M. Grabka. 2009. Wealth Inequality on the Rise in Germany. Weekly Report 5 (10), 62-73 (PDF, 383.22 KB).

3.2 Dataset $PEQUIV

3.3 Dataset HHRF/PHRF

  • $HHRF / $PHRF: We provide an update and revision of the post-stratification scheme and an additional regional calibration of SOEP weights.
    Short documentation (PDF, 87.06 KB)

3.4 Dataset $PGEN

  • EMPLST$$: A new category has been added to this variable ("Employment status") From 1998 on, the SOEP data contains information on working in a sheltered workshop for the disabled. Since these persons do not provide information on whether they work full-time, parttime, or on an irregular basis, the new category "sheltered workshop" has been included.

3.5 Dataset $HGEN
The domicile-related variables in the wave-specific $HGEN files have been completely revised. New additions include the full imputation of missing values (due to item-non-response) for the housingrelated variables number of rooms, heating costs, gross rent excluding heating, as well as the newly generated variable on utility costs in addition to rent. Finally, “flag variables” show the imputation status, if relevant. Experienced SOEP users may also note the change of the various variable names in the file $HGEN.

3.6 Dataset PPFAD

  • TODJAHR / TODINFO: To separate panel mortality from demographic reasons for dropping out from the SOEP sample, TNS Infratest carried out several studies to determine the current residence of panel dropouts, i.e. earlier respondents who no longer take part in the SOEP. This entailed locating 17,195 persons. These investigations allowed 981 cases to be identified in which the dropout had died. However, until 2008 all in all 3791 deaths have been identified in the SOEP (see also the documentation on the variables TODJAHR and TODINFO in the file PPFAD). Additionally, there exists a documentation in German language from our fieldwork organization TNS Infratest (“Wiederbefragung von Panelausfällen (PDF, 368.88 KB)” and an English language summary (PDF, 36.18 KB)).

Feb. 10, 2010

Downloadable bug-fix for children's weighting factors of wave Y (2008)

Individuals born in 2002 (thus being 6 years of age in wave Y, 2008) whose parents completed the newly introduced child questionnaire for this particular cohort did not receive a valid score on the wave-specific cross-sectional weighting variable (this population can be identified by YNETTO=23). This affects the variable YPHRF in the file PPHRF and the variable W1110108 in the file YPEQUIV. This inaccuracy applies only to these 237 children aged 6 in this particular wave and affects only the individual, but not the household weights. Moreover, any weighted analysis based only on adult respondents using, for instance, the YP and YPGEN files is virtually unaffected by this error. Users who wish to include the six-year-olds in a weighted analysis are asked to download updated versions of the datasets YPHRF and YPEQUIV.

Please send an email to to request a personalized URL and further details.

Dec. 5, 2009

In the dataset BIOIMMIG an incorrect assignment to the variable BIGOBACK (the variable on the probability to return home) was made for the categories -2 (“does not apply”) and 2 (“Yes, probably”) in some cases since 2001.

To correct this error, please download the appropriate script for your statistical program (SAS, SPSS or Stata) and run it after adjusting the script to the path of your local settings.

Script for Stata (TXT, 320.45 KB)

Script for SPSS (TXT, 289.2 KB)

Script for SAS (TXT, 309.72 KB)


Nov. 9, 2009

Shortly after completing the DVD, an error in data generation was identified in the file BIOPAREN.
The error is in the categories of parental religious affiliation (MRELI, VRELI). The codes for the categories "other Christian affiliation", "Islamic affiliation" as well as "other religious affiliation" require correction. The other categories of the variable are not affected.

To correct this error, please download the appropriate script for your statistical program (SAS, SPSS or Stata) and run it after adjusting the script to the path of your local settings.

Script for Stata (TXT, 75.48 KB)

Script for SPSS (TXT, 64.96 KB)

Script for SAS (TXT, 75.55 KB)

If you need an update for another statistical programm, please contact our hotline at

