The German Socio-Economic Panel Study (SOEP) is a wide-ranging representative longitudinal study of private households, located at the German Institute for Economic Research, DIW Berlin. Every year, there were nearly 11,000 households, and more than 20,000 persons sampled by the fieldwork organization TNS Infratest Sozialforschung. The data provide information on all household members, consisting of Germans living in the Old and New German States, Foreigners, and recent Immigrants to Germany. The Panel was started in 1984. Some of the many topics include household composition, occupational biographies, employment, earnings, health and satisfaction indicators.
As early as June 1990-even before the Economic, Social and Monetary Union-SOEP expanded to include the states of the former German Democratic Republic (GDR), thus seizing the rare opportunity to observe the transformation of an entire society. An immigrant sample was added as well to account for the changes that took place in Germany society in 1994/95. Further new samples were added in 1998, 2000, 2002, and 2006. The survey is constantly being adapted and developed in response to current social developments.
Title: German Sozio-oekonomisches Panel Study (SOEP), data of the years 1984 – 2008
DOI: 10.5684/soep.v25
Collection period: 1984–2008
Publication date: 27.10.2009
Principal investigators: Gert. G. Wagner, Joachim R. Frick, Jürgen Schupp, Silke Anger, Jan Goebel, Markus M. Grabka, Elke Holst, Peter Krause, Martin Kroh, Henning Lohmann, Christian Schmitt, C. Katharina Spieß
Data collector: TNS Infratest Sozialforschung GmbH
Population: Persons living in private households in Germany
Selection method: All samples of SOEP are multi-stage random samples which are regionally clustered. The respondents (households) are selected by random-walk.
Collection mode: The interview methodology of the SOEP is based on a set of pre-tested qustionnaires for households and individuals. Principally an interviewer tries to obtain face-to-face interviews with all members of a given survey household aged 16 years and over. Additionally one person (head of household) is asked to answer a household related questionnaire covering information on housing, housing costs, and different sources of income. This covers also some questions on children in the household up to 16 years of age, mainly concerning attendance at institutions (kindergarten, elementary school, etc.)
Dataset information:
Number of units | 62,101 |
Number of variables | 41.348 in 309 datasets |
Data formats |
STATA, SPSS, SAS, CSV |
Publications:
Publications using this file should refer to the above DOI Find an explanation on the usage of DOI here.and cite following references
If you do not exclude the cases of the migration samples in your analysis, then please also cite the following reference
If you do not exclude the cases of the refugee samples in your analysis, please also cite: IAB-BAMF-SOEP survey of refugees (M3-M5), data for the years 2016-2021,
If you use data from the SOEP-LEE2 surveys, please also cite:
If you would like to refer more specifically, please also cite:
The new dataset (Waves 1-25, 1984-2008) contains extensive improvements, additions, and modifications. Besides the usual wave-specific data YPRUTTO, YP, YPKAL, YPGEN, YHBRUTTO, YH, YHGEN, YKIND, and XPLUECKE as well as the updated files with a longitudinal component (PFAD files, biography files, spell data, and weighting factors).
We now also provide-in a beta-release-the data in a more user-friendly format called "SOEPlong". We announced this in SOEPnewsletter 80/2008 and thank all those who provided input on this issue. The new and preliminary version of the SOEP data in long format can be obtained upon request. We suggest that only "power users" should order this version of the data who would like to work with us to improve data management. This version contains all data and thus can essentially already be used for final analyses. This is a preliminary version. We do not recommend the new format for inexperienced users. If new SOEP users want to work with the new format, they should at least be familiar with other panel datasets.
The most important improvements in the new data istribution are listed in the following:
1. New Datasets
1.1 Dataset BIOAGE06
The new data distribution contains the new file BIOAGE06. For the first time in 2008, it includes the information collected using the special motherchild questionnaire, usually from mothers of fiveto- six-year-old pre-schoolers. The data are thus on a birth cohort that was first “surveyed” in the year 2002/2003 with a special Newborn Questionnaire. The new data on pre-school-age children contains children’s height and weight, health, care situation, activities with and without the mother, and media usage. Detailed questions address the care situation. Furthermore, valid information on the child’s personality is collected for the first time (based on the “Big Five” personal traits indicator in the main questionnaire for adults) and their socio-emotional behavior (surveyed with a modified version of the Strength and Difficulties Questionnaire).
1.2 Dataset MOVEDIST
We provide a new dataset on the change of residence. Based on the geo coordinates at block level we will provide the information about the distance (in meter) between the former and the present residence. However the information will only be available for moves since 2000 and is NOT available on this DVD! We distribute this data together with data on the spatial planning regions (ROR) on an extra CD-ROM. You need an extended data distribution contract including a data protection concept if you want to use this kind of data. After signing your contract extension, you will receive this data on CD-ROM (at no additional cost).
2 New Variables
2.1 Dataset PPFAD
2.2 Dataset PFLEGE
2.3 Dataset PBIOSPE
The data generation process has been updated completely but without changing the basic principles. Therefore, there are only a few barely discernible deviations in the main variables (due to slight changes in the consistency checks of the data). But there are a number of visible changes in the form of additional variables or additional values in already existing variables. A detailed description is available in our documentation on biography and life history data.
2.4 Dataset BIOPAREN
3 Revised Variables
3.1 Dataset PWEALTH and HWEALTH
In the year 2007, all individuals aged 17 and up were again surveyed on wealth, just as they were for the first time in 2002. These “raw” data were already part of the standard data distribution for Wave 24 and will be distributed with the upcoming data distribution in a file containing the data for 2002 and 2007 in “long format” – the file PWEALTH for individual data, HWEALTH with data aggregated according to household context. Missing values due to item or partial unit non-response (e.g., missing interviews with individual household members in interviewed households) will be subjected to multiple imputations in complex procedures taking longitudinal information into account. Documentation on this is under preparation. An initial analysis of the new wealth data for 2002 and 2007 is provided in: Joachim R. Frick und Markus M. Grabka. 2009. Wealth Inequality on the Rise in Germany. Weekly Report 5 (10), 62-73 (PDF, 383.22 KB).
3.2 Dataset $PEQUIV
3.3 Dataset HHRF/PHRF
3.4 Dataset $PGEN
3.5 Dataset $HGEN
The domicile-related variables in the wave-specific $HGEN files have been completely revised. New additions include the full imputation of missing values (due to item-non-response) for the housingrelated variables number of rooms, heating costs, gross rent excluding heating, as well as the newly generated variable on utility costs in addition to rent. Finally, “flag variables” show the imputation status, if relevant. Experienced SOEP users may also note the change of the various variable names in the file $HGEN.
3.6 Dataset PPFAD
Feb. 10, 2010 |
Downloadable bug-fix for children's weighting factors of wave Y (2008) Individuals born in 2002 (thus being 6 years of age in wave Y, 2008) whose parents completed the newly introduced child questionnaire for this particular cohort did not receive a valid score on the wave-specific cross-sectional weighting variable (this population can be identified by YNETTO=23). This affects the variable YPHRF in the file PPHRF and the variable W1110108 in the file YPEQUIV. This inaccuracy applies only to these 237 children aged 6 in this particular wave and affects only the individual, but not the household weights. Moreover, any weighted analysis based only on adult respondents using, for instance, the YP and YPGEN files is virtually unaffected by this error. Users who wish to include the six-year-olds in a weighted analysis are asked to download updated versions of the datasets YPHRF and YPEQUIV. Please send an email to soepmail@diw.de to request a personalized URL and further details. |
Dec. 5, 2009 |
In the dataset BIOIMMIG an incorrect assignment to the variable BIGOBACK (the variable on the probability to return home) was made for the categories -2 (“does not apply”) and 2 (“Yes, probably”) in some cases since 2001. To correct this error, please download the appropriate script for your statistical program (SAS, SPSS or Stata) and run it after adjusting the script to the path of your local settings. Script for Stata (TXT, 320.45 KB)
|
Nov. 9, 2009 |
Shortly after completing the DVD, an error in data generation was identified in the file BIOPAREN. To correct this error, please download the appropriate script for your statistical program (SAS, SPSS or Stata) and run it after adjusting the script to the path of your local settings. Script for Stata (TXT, 75.48 KB) If you need an update for another statistical programm, please contact our hotline at soepmail@diw.de. |
Survey Instruments 2008: Field-de
Please find all sample specific questionnaires of this year and all questionnaires of previous years on this site
1) Handgreifkraftmessung im Sozio-oekonomischen Panel (SOEP) 2006 und 2008
2) The new IAB-SOEP Migration Sample: an introduction into the methodology and the contents
3) The Request for Record Linkage in the IAB-SOEP Migration Sample
5) The Measurement of Labor Market Entries with SOEP Data: Introduction to the Variable EINSTIEG_ARTK
6) Job submission instructions for the SOEPremote System at DIW Berlin – Update 2014
7) SOEP 2015 – Informationen zu den SOEP-Geocodes in SOEP v32
9) Die Vercodung der offenen Angaben zu den Ausbildungsberufen im Sozio-Oekonomischen Panel
10) Das Studiendesign der IAB-BAMF-SOEP Befragung von Geflüchteten
11) Scales Manual IAB-BAMF-SOEP Survey of Refugees in Germany – revised version
12) SOEP 2010 – Preparation of data from the new SOEP consumption module: Editing, imputation, and smoothing
13) SOEP Scales Manual (updated for SOEP-Core v32.1)
17) Multi-Itemskalen im SOEP Jugendfragebogen
20) SOEP-CoV: Project and Data Documentation
22) SOEP 2013 – Documentation of Generated Person-Level Long-Term Care Variables in PFLEGE
23) SOEP-Core v34 – PFLEGE: Documentation of Generated Person-level Long-term Care Variables
26) SOEP-Core v36: Codebook for the EU-SILC-like panel for Germany based on the SOEP
All documentation for filtering can be found on this page