The German Socio-Economic Panel Study (SOEP) is a wide-ranging representative longitudinal study of private households, located at the German Institute for Economic Research, DIW Berlin. Every year, there were nearly 11,000 households, and more than 20,000 persons sampled by the fieldwork organization TNS Infratest Sozialforschung. The data provide information on all household members, consisting of Germans living in the Old and New German States, Foreigners, and recent Immigrants to Germany. The Panel was started in 1984. Some of the many topics include household composition, occupational biographies, employment, earnings, health and satisfaction indicators. As early as June 1990-even before the Economic, Social and Monetary Union-SOEP expanded to include the states of the former German Democratic Republic (GDR), thus seizing the rare opportunity to observe the transformation of an entire society. An immigrant sample was added as well to account for the changes that took place in Germany society in 1994/95. Further new samples were added in 1998, 2000, 2002, 2006 and 2009. The survey is constantly being adapted and developed in response to current social developments.
Title: German Socio-oeconomic Panel Study (SOEP), data of the years 1984–2010
Collection period: 1984-2010
Publication date: Oct. 21, 2011
Principal investigators: Gert. G. Wagner, Joachim R. Frick, Jürgen Schupp, Silke Anger, Marco Giesselmann, Jan Goebel, Markus M. Grabka, Elke Holst, Peter Krause, Martin Kroh, Elisabeth Liebau, Henning Lohmann, David Richter, Christian Schmitt, Daniel Schnitzlein, C. Katharina Spieß
Data collector: TNS Infratest Sozialforschung GmbH
Population: Persons living in private households in Germany
Selection method: All samples of SOEP are multi-stage random samples which are regionally clustered. The respondents (households) are selected by random-walk.
Collection mode: The interview methodology of the SOEP is based on a set of pre-tested qustionnaires for households and individuals. Principally an interviewer tries to obtain face-to-face interviews with all members of a given survey household aged 16 years and over. Additionally one person (head of household3) is asked to answer a household related questionnaire covering information on housing, housing costs, and different sources of income. This covers also some questions on children in the household up to 16 years of age, mainly concerning attendance at institutions (kindergarten, elementary school, etc.)
|Number of units||66,813|
|Number of variables||45,536 in 339 Data sets|
|Data format||STATA, SPSS, SAS, CSV|
|MD5 fingerprints der einzelnen Dateien||
Stata German | TXT, 15.15 KB
Publications using this file should refer to the above DOI Find an explanation on the usage of DOI here.and cite one of the following references
The release of the 1984-2010 SOEP data (waves A-BA) will contain the usual year-specific data files (BAP, BAH, BAPGEN, BAHGEN, BAPKAL, BAPBRUTTO, BAHBRUTTO, BAKIND and ZPLUECKE) and the updated files with a longitudinal component (PFAD files, biography files, spell data, and weighting factors). The respondents of Sample I (Incentives Sample) answered the biographical background questionnaire for the very first time in 2010.
Since minor changes have been made to many of the older datasets as well, we strongly recommend reinstalling all of the datasets from the new DVD.
1. New two-letter prefix (BA)
This SOEP data release (v27) will include, for the first time in the survey's 27 years, a two-letter rather than a single-letter wave prefix. Since we came to the end of the Latin alphabet with the letter Z in our last data release, we decided to use the wave prefix BA for the cross-sectional data format.
2. Updated beta version in "long format"
The SOEP data are now also available in "long format" as a beta version in addition to the usual data format. SOEPlong refers to a compressed form of the SOEP data. Rather than being provided as wave-specific individual files, all available years and cohorts are pooled (long format). The data are available on the second DVD. For details, see SOEPnewsletter No. 90/2010 (PDF, 3.53 MB).
3. Elimination of fakes
When the data for the second wave of our newest sample I were checked, 36 households were identified as faked interviews and will therefore no longer be included in this data release.
4. New and renamed datasets
The BIOAGE08 dataset contains data from the new "parent questionnaire" which is given to the mothers and fathers of seven- to eight-year-old children. Thus, data are now available on the 2002/2003 birth cohorts that were first observed with the "newborn questionnaire." The new "parent questionnaire" is given to both mothers and fathers and thus provides two sets of responses on many of the children in the sample. Therefore, the file was split into two on the basis of the parent's gender and the household type the respondent is living in. BIOAGE08A includes only mothers and some fathers, where there was no information from the mothers available. BIOAGE08B includes fathers only. The documentation of this new dataset is included as a new chapter in our documentation on biography and life history data in SOEP (coming soon).
The LIFESPELL dataset contains data from the follow-up studies of SOEP dropouts (1992, 2001, 2006, and 2008), which were not previously included in the regular data release. The follow-up studies, based on information from public registers, serve to identify the current residence of former SOEP respondents, and thus allow studies of life expectancy and decisions to emigrate for a large percentage of SOEP respondents, even long after they have dropped out of the study. The documentation of this new dataset is included as a new chapter in our documentation on biography and life history data in SOEP (coming soon).
4.3 BIOEDU (beta version)
The BIOEDU dataset, which is being released this year in provisional form (beta release), contains details on educational transitions beginning with entrance into childcare up to tertiary education in consistently structured form. Users who work with these data are requested to report on their experiences (especially any problems they might have), so that a final version can be released next year.
Detailed documentation is in the DIW Data Documentation 58 (PDF, 383.03 KB).
Because of its provisional form, this dataset is not part of the normal distribution and you will find the data on the DVD in an extra archive (link available in the NEWS file on the DVD).
5. New Variables
5.1 $PEQUIV dataset
In the $PEQUIV-files, there will be a new additional variable on support payments. With the 2010 questionnaire, the SOEP has split the item "support payments" into two separate items. The first one now collects information on "alimony from legal spousal support, child support, and child care support" (ALIM$$), while the second item asks about "advance child maintenance payments" (IACHM$$). More information about the $PEQUIV files and the new variables is available in the DIW Data Documentation 57 (PDF, 0.54 MB).
5.2 PFLEGE dataset
The PFLEGE file now includes two new additional variables. "FURTHER" gives the number of further persons requiring help in the household. This question has been asked since 2009. "CARECOST" represents the regular monthly costs for care that a household normally spends. This question has been asked since 2010.
5.3 Dataset $PGEN
We are now providing detailed data on educational degrees and training qualifications prior to joining the panel: life course questionnaires have been distributed since 2001 to collect data on apprenticeship occupation, type of qualification (e.g., diploma), and the field of study for those who have obtained a degree. Up to now, data from these open-answer questions were not included in the data release. From now on, however, these data will be released in coded form. The classifications used for the data from the individual questionnaire have been slightly modified in the process of these revisions. A more detailed description is available within the PGEN documentation.
The new variables are
|FIELD$$||Field of tertiary education|
|DEGREE$$||Type of tertiary degree|
|TRAINA$$||Apprenticeship-two-digit occupation KldB92|
|TRAINB$$||Vocational school-twodigit occupation KldB92|
|TRAINC$$||Higher vocational school-twodigit occupation KldB92|
|TRAIND$$||Civil servant training-twodigit occupation KldB92|
|FDT_F$$||Data source FIELD, DEGREE, TRAIN|
6. Revised Variables
6.1 $P dataset
Name changes to the variables in the different classifications for occupation and sector in $P: the variables contained in the $P datasets are collected in alternate years from all respondents and from those individuals who changed occupations. Simultaneously, we also generate and distribute all the information on all years and all individuals in the $PGEN datasets. To more clearly distinguish the generated variables from the originally surveyed variables, and to establish a clear connection to the question number in the respective questionnaire, we have renamed the variables according to the following system:
|Old variable name||New variable name|
Thus, ZIS88, for example, is now ZP29_IS88. The corresponding variables of all datasets from the individual questionnaire ($P) have been renamed, but not the variables recommended for use from the generated datasets ($PGEN, e.g., IS8809 from ZP).
6.2 $HBRUTTO dataset
The coding of the variables identifying the federal state ($BULA, Bundesland) in which a household was included, is now consistent with the coding of the official statistics.
|$bula (old codings)||$bula (new codings)|
|1 Schleswig - Holstein||1 Schleswig-Holstein|
|2 Hamburg||2 Hamburg|
|3 Niedersachsen||3 Niedersachsen|
|4 Bremen||4 Bremen|
|5 Nordrhein-Westfalen||5 Nordrhein-Westfalen|
|6 Hessen||6 Hessen|
|7 Rheinl.-Pfalz, Saarl.||7 Rheinland-Pfalz|
|8 Baden-Wuerttemberg||8 Baden-Wuerttemberg|
|9 Bayern||9 Bayern|
|11 Berlin (Ost)||11 Berlin|
|12 Mecklenburg-Vorpommern||12 Brandenburg|
|13 Brandenburg||13 Mecklenburg-Vorpommern|
|14 Sachsen-Anhalt||14 Sachsen|
|15 Thueringen||15 Sachsen-Anhalt|
|16 Sachsen||16 Thueringen|
A differentiation between East and West Berlin can still be achieved by a combination with $SAMPREG (Sample Region in $PPFAD).
6.3. Minor bug fixes
1984-2010 (Wave BA)
|March, 30, 2012||
An update for all corrected files can be downloaded, but only by means of a personalized link. Please contact email@example.com to obtain such a link.
Please note: If you use one of the provided bugfixes in your analyses we recommend citing it as follows:
|Jan 2, 2012||
Also, in the $PGEN data sets, no English value labels were generated for the new variables on educational degrees and training qualifications prior to joining the panel. This applies to the English labels for the following variables:
If you use one of those variables, please contact firstname.lastname@example.org to obtain a download link for the bugfixes.
PPFADL in SOEPlong
An update for PPFADL can be downloaded, but only by means of a personalized link. Please contact email@example.com to obtain such a link.
Please note: If you use one of the provided bugfixes in your analyses we recommend citing it as follows:
15) Die Vercodung der offenen Angaben zur beruflichen Tätigkeit nach der International Standard Classification of Occupations 2008 (ISCO08) - Direktvercodung - Vorgehensweise und Entscheidungsregeln bei nicht eindeutigen Angaben
All documentation for filtering can be found on this page