SOEP-Core v27 (data 1984-2010)

The German Socio-Economic Panel Study (SOEP) is a wide-ranging representative longitudinal study of private households, located at the German Institute for Economic Research, DIW Berlin. Every year, there were nearly 11,000 households, and more than 20,000 persons sampled by the fieldwork organization TNS Infratest Sozialforschung. The data provide information on all household members, consisting of Germans living in the Old and New German States, Foreigners, and recent Immigrants to Germany. The Panel was started in 1984. Some of the many topics include household composition, occupational biographies, employment, earnings, health and satisfaction indicators. As early as June 1990-even before the Economic, Social and Monetary Union-SOEP expanded to include the states of the former German Democratic Republic (GDR), thus seizing the rare opportunity to observe the transformation of an entire society. An immigrant sample was added as well to account for the changes that took place in Germany society in 1994/95. Further new samples were added in 1998, 2000, 2002, 2006 and 2009. The survey is constantly being adapted and developed in response to current social developments.

Dataset Information

Title: German Socio-oeconomic Panel Study (SOEP), data of the years 1984–2010

DOI: 10.5684/soep.v27
Collection period: 1984-2010
Publication date: Oct. 21, 2011
Principal investigators
: Gert. G. Wagner, Joachim R. Frick, Jürgen Schupp, Silke Anger, Marco Giesselmann, Jan Goebel, Markus M. Grabka, Elke Holst, Peter Krause, Martin Kroh, Elisabeth Liebau, Henning Lohmann, David Richter, Christian Schmitt, Daniel Schnitzlein, C. Katharina Spieß

Data collector: TNS Infratest Sozialforschung GmbH

Population: Persons living in private households in Germany

Selection method: All samples of SOEP are multi-stage random samples which are regionally clustered. The respondents (households) are selected by random-walk.

Collection mode: The interview methodology of the SOEP is based on a set of pre-tested qustionnaires for households and individuals. Principally an interviewer tries to obtain face-to-face interviews with all members of a given survey household aged 16 years and over. Additionally one person (head of household3) is asked to answer a household related questionnaire covering information on housing, housing costs, and different sources of income. This covers also some questions on children in the household up to 16 years of age, mainly concerning attendance at institutions (kindergarten, elementary school, etc.)

Datensatzinformationen:

Number of units  66,813
Number of variables  45,536 in 339 Data sets
Data format  STATA, SPSS, SAS, CSV
MD5 fingerprints der einzelnen Dateien

Stata German | TXT, 15.15 KB
Stata English | TXT, 15.15 KB
Stata German+English | TXT, 15.15 KB
SPSS German | TXT, 15.15 KB
SPSS English | TXT, 15.15 KB
SAS German | TXT, 15.15 KB
SAS English | TXT, 16.8 KB
CSV | TXT, 15.15 KB
bioedu | TXT, 270 Byte

Publications:

  • Jan Goebel, Markus M. Grabka, Stefan Liebig, Martin Kroh, David Richter, Carsten Schröder, Jürgen Schupp. 2018. The German Socio-Economic Panel Study (SOEP). Jahrbücher für Nationalökonomie und Statistik / Journal of Economics and Statistics (online first), doi: 10.1515/jbnst-2018-0022
  • Gert G. Wagner, Jan Göbel, Peter Krause, Rainer Pischner, and Ingo Sieber (2008) Das Sozio-oekonomische Panel (SOEP): Multidisziplinäres Haushaltspanel und Kohortenstudie für Deutschland - Eine Einführung (für neue Datennutzer) mit einem Ausblick (für erfahrene Anwender), AStA Wirtschafts- und Sozialstatistisches Archiv 2 (2008), No. 4, 301-328 (download)
  • Schupp, Jürgen (2009): 25 Jahre Sozio-oekonomisches Panel - Ein Infrastrukturprojekt der empirischen Sozial- und Wirtschaftsforschung in Deutschland, Zeitschrift für Soziologie 38(5), pp. 350-357.

Publications using this file should refer to the above DOI infoFind an explanation on the usage of DOI here.and cite one of the following references

  • Goebel, Jan, Markus M. Grabka, Stefan Liebig, Martin Kroh, David Richter, Carsten Schröder, and Jürgen Schupp. 2019. The German Socio-Economic Panel (SOEP). Jahrbücher für Nationalökonomie und Statistik (Journal of Economics and Statistics) 239 (2), 345-360. (https://doi.org/10.1515/jbnst-2018-0022)
  • Schröder, Carsten, Johannes König, Alexandra Fedorets, Jan Goebel, Markus M. Grabka, Holger Lüthen, Maria Metzing, Felicitas Schikora, and Stefan Liebig. 2020. The economic research potentials of the German Socio-Economic Panel study. German Economic Review 21 (3), 335-371. (https://doi.org/10.1515/ger-2020-0033)
  • Giesselmann, Marco, Sandra Bohmann, Jan Goebel, Peter Krause, Elisabeth Liebau, David Richter, Diana Schacht, Carsten Schröder, Jürgen Schupp, and Stefan Liebig. 2019. The Individual in Context(s): Research Potentials of the Socio-Economic Panel Study (SOEP) in Sociology. European Sociological Review 35 (5), 738-755. (https://doi.org/10.1093/esr/jcz029)

SOEP v27 (Original dataset)

SOEP v27.1 (Bug from Jan. 2012)

SOEP v27.2 (Bug from March 2012)

SOEP v27.2i (international version)

The release of the 1984-2010 SOEP data (waves A-BA) will contain the usual year-specific data files (BAP, BAH, BAPGEN, BAHGEN, BAPKAL, BAPBRUTTO, BAHBRUTTO, BAKIND and ZPLUECKE) and the updated files with a longitudinal component (PFAD files, biography files, spell data, and weighting factors). The respondents of Sample I (Incentives Sample) answered the biographical background questionnaire for the very first time in 2010.
Since minor changes have been made to many of the older datasets as well, we strongly recommend reinstalling all of the datasets from the new DVD.

1. New two-letter prefix (BA)

This SOEP data release (v27) will include, for the first time in the survey's 27 years, a two-letter rather than a single-letter wave prefix. Since we came to the end of the Latin alphabet with the letter Z in our last data release, we decided to use the wave prefix BA for the cross-sectional data format.

2. Updated beta version in "long format"

The SOEP data are now also available in "long format" as a beta version in addition to the usual data format. SOEPlong refers to a compressed form of the SOEP data. Rather than being provided as wave-specific individual files, all available years and cohorts are pooled (long format). The data are available on the second DVD. For details, see SOEPnewsletter No. 90/2010 (PDF, 3.53 MB).

3. Elimination of fakes

When the data for the second wave of our newest sample I were checked, 36 households were identified as faked interviews and will therefore no longer be included in this data release.

4. New and renamed datasets

4.1 BIOAGE08[A|B]

The BIOAGE08 dataset contains data from the new "parent questionnaire" which is given to the mothers and fathers of seven- to eight-year-old children. Thus, data are now available on the 2002/2003 birth cohorts that were first observed with the "newborn questionnaire." The new "parent questionnaire" is given to both mothers and fathers and thus provides two sets of responses on many of the children in the sample. Therefore, the file was split into two on the basis of the parent's gender and the household type the respondent is living in. BIOAGE08A includes only mothers and some fathers, where there was no information from the mothers available. BIOAGE08B includes fathers only. The documentation of this new dataset is included as a new chapter in our documentation on biography and life history data in SOEP (coming soon).

4.2 LIFESPELL

The LIFESPELL dataset contains data from the follow-up studies of SOEP dropouts (1992, 2001, 2006, and 2008), which were not previously included in the regular data release. The follow-up studies, based on information from public registers, serve to identify the current residence of former SOEP respondents, and thus allow studies of life expectancy and decisions to emigrate for a large percentage of SOEP respondents, even long after they have dropped out of the study. The documentation of this new dataset is included as a new chapter in our documentation on biography and life history data in SOEP (coming soon).

4.3 BIOEDU (beta version)

The BIOEDU dataset, which is being released this year in provisional form (beta release), contains details on educational transitions beginning with entrance into childcare up to tertiary education in consistently structured form. Users who work with these data are requested to report on their experiences (especially any problems they might have), so that a final version can be released next year.
Detailed documentation is in the DIW Data Documentation 58 (PDF, 383.03 KB).

Because of its provisional form, this dataset is not part of the normal distribution and you will find the data on the DVD in an extra archive (link available in the NEWS file on the DVD).

5. New Variables

5.1 $PEQUIV dataset

In the $PEQUIV-files, there will be a new additional variable on support payments. With the 2010 questionnaire, the SOEP has split the item "support payments" into two separate items. The first one now collects information on "alimony from legal spousal support, child support, and child care support" (ALIM$$), while the second item asks about "advance child maintenance payments" (IACHM$$). More information about the $PEQUIV files and the new variables is available in the DIW Data Documentation 57 (PDF, 0.54 MB).

5.2 PFLEGE dataset

The PFLEGE file now includes two new additional variables. "FURTHER" gives the number of further persons requiring help in the household. This question has been asked since 2009. "CARECOST" represents the regular monthly costs for care that a household normally spends. This question has been asked since 2010.

5.3 Dataset $PGEN

We are now providing detailed data on educational degrees and training qualifications prior to joining the panel: life course questionnaires have been distributed since 2001 to collect data on apprenticeship occupation, type of qualification (e.g., diploma), and the field of study for those who have obtained a degree. Up to now, data from these open-answer questions were not included in the data release. From now on, however, these data will be released in coded form. The classifications used for the data from the individual questionnaire have been slightly modified in the process of these revisions. A more detailed description is available within the PGEN documentation.
The new variables are

FIELD$$ Field of tertiary education
DEGREE$$ Type of tertiary degree
TRAINA$$ Apprenticeship-two-digit occupation KldB92
TRAINB$$ Vocational school-twodigit occupation KldB92
TRAINC$$ Higher vocational school-twodigit occupation KldB92
TRAIND$$ Civil servant training-twodigit occupation KldB92
FDT_F$$ Data source FIELD, DEGREE, TRAIN

6. Revised Variables

6.1 $P dataset

Name changes to the variables in the different classifications for occupation and sector in $P: the variables contained in the $P datasets are collected in alternate years from all respondents and from those individuals who changed occupations. Simultaneously, we also generate and distribute all the information on all years and all individuals in the $PGEN datasets. To more clearly distinguish the generated variables from the originally surveyed variables, and to establish a clear connection to the question number in the respective questionnaire, we have renamed the variables according to the following system:

 

Old variable name New variable name
$IS88 $pXX_IS88
$KLAS $pXX_KLAS
$BACE $pXX_NACE
$IS88 $pXX_IS88
$KLAS $pXX_KLAS

Thus, ZIS88, for example, is now ZP29_IS88. The corresponding variables of all datasets from the individual questionnaire ($P) have been renamed, but not the variables recommended for use from the generated datasets ($PGEN, e.g., IS8809 from ZP).

6.2 $HBRUTTO dataset

The coding of the variables identifying the federal state ($BULA, Bundesland) in which a household was included, is now consistent with the coding of the official statistics.

$bula (old codings) $bula (new codings)
0 Berlin  
1 Schleswig - Holstein 1 Schleswig-Holstein
2 Hamburg 2 Hamburg
3 Niedersachsen 3 Niedersachsen
4 Bremen 4 Bremen
5 Nordrhein-Westfalen 5 Nordrhein-Westfalen
6 Hessen 6 Hessen
7 Rheinl.-Pfalz, Saarl. 7 Rheinland-Pfalz
8 Baden-Wuerttemberg 8 Baden-Wuerttemberg
9 Bayern 9 Bayern
  10 Saarland
11 Berlin (Ost) 11 Berlin
12 Mecklenburg-Vorpommern  12 Brandenburg
13 Brandenburg  13 Mecklenburg-Vorpommern
14 Sachsen-Anhalt 14 Sachsen
15 Thueringen 15 Sachsen-Anhalt
16 Sachsen 16 Thueringen

 A differentiation between East and West Berlin can still be achieved by a combination with $SAMPREG (Sample Region in $PPFAD).

6.3. Minor bug fixes

  • In the BIOAGE17 file from data release v26, variables classifying the preferred job reported in the youth questionnaire (byklas, bymps, byisco88, byegp, byisei, bysiops) are deficiently coded and contain too many missing values. With the new data release v27, this bug has been fixed.
  • Some households in Berlin (only in 2006) were wrongly classified according to the variable of settlement pattern ($GTYP) in the dataset GGKBOU. This bug has also been fixed.

1984-2010 (Wave BA)

March, 30, 2012

BIOAGE03
The age of the children was not correct and had to be recalculated. In addition, some missing values for children’s weight and height had the wrong value “0” and had to be recoded. Finally, the number of doctor visits for the survey years 2005 and 2006 were set to “0” by mistake and had to be recoded.

BIOAGE06
For 14 children, the birth month was missing although this information was available in recent waves. This information was added. In addition, one incorrect person number and one incorrect household number were changed.

BIOAGE08
The age of the children was not correct and had to be recalculated.

LIFESPELL
In the update of the dataset LIFESPELL, approximately 100 cases of emigration were recoded for the time period 2006-2010. In the original version of v27 they were incorrectly specified as living in Germany. The revised LIFESPELL file also contains new information about the year of death for a small number of individuals. For more information please contact Hannes Neiss (hneiss@diw.de). 

An update for all corrected files can be downloaded, but only by means of a personalized link. Please contact soepmail@diw.de to obtain such a link.

Please note: If you use one of the provided bugfixes in your analyses we recommend citing it as follows:
English:
Socio-Economic Panel (SOEP), data for years 1984-2010, version 27.2, SOEP, 2012.
German:
Sozio-oekonomisches Panel (SOEP), Daten für die Jahre 1984-2010, Version 27.2, SOEP, 2012.
Short Version:
SOEP v27.2

Jan 2, 2012

 COGDJ
In the file COGDJ, the 2010 data had not yet been updated in the released version. For a bugfix for download please contact soepmail@diw.de.

 English labels
In the data sets ZHBRUTTO and BAHBRUTTO, some of the English labels shifted position and had to be redefined. This applies to the following variables:

ZHBRUTTO
BAHBRUTTO
SAMPLE1  
ZBULA  
ZDATUMMO  
ZHAND  
ZHERGS  
ZHTYP BAHTYP
ZSAMPREG

Also, in the $PGEN data sets, no English value labels were generated for the new variables on educational degrees and training qualifications prior to joining the panel. This applies to the English labels for the following variables:
FIELD$$, DEGREE$$, and TRAINA$$–TRAIND$$.

If you use one of those variables, please contact soepmail@diw.de to obtain a download link for the bugfixes.

PPFADL in SOEPlong
In the SOEPlong data version distributed earlier this year, the following two variables in the file PPFADL had missing values in 2010:

HID — key indicator for households and
NETT1 — the short version of the tracking variable NETTO.

An update for PPFADL can be downloaded, but only by means of a personalized link. Please contact soepmail@diw.de to obtain such a link. 

Please note: If you use one of the provided bugfixes in your analyses we recommend citing it as follows:
English:
Socio-Economic Panel (SOEP), data for years 1984-2010, version 27.1, SOEP, 2012.
German:
Sozio-oekonomisches Panel (SOEP), Daten für die Jahre 1984-2010, Version 27.1, SOEP, 2012.
Short Version:
SOEP v27.1.


Survey Instruments 2010: Field-de

Please find all sample specific questionnaires of this year and all questionnaires of previous years on this site

1) Handgreifkraftmessung im Sozio-oekonomischen Panel (SOEP) 2006 und 2008

2) The new IAB-SOEP Migration Sample: an introduction into the methodology and the contents

3) The Request for Record Linkage in the IAB-SOEP Migration Sample

4) Flowcharts for the Integrated Individual-Biography Questionnaire of the IAB-SOEP Migration Sample 2013

5) The Measurement of Labor Market Entries with SOEP Data: Introduction to the Variable EINSTIEG_ARTK

6) Job submission instructions for the SOEPremote System at DIW Berlin – Update 2014

7) SOEP 2015 – Informationen zu den SOEP-Geocodes in SOEP v32

8) Editing and Multiple Imputation of Item Non-response in the Wealth Module of the German Socio-Economic Panel

9) Die Vercodung der offenen Angaben zu den Ausbildungsberufen im Sozio-Oekonomischen Panel

10) Das Studiendesign der IAB-BAMF-SOEP Befragung von Geflüchteten

11) Scales Manual IAB-BAMF-SOEP Survey of Refugees in Germany – revised version

12) SOEP 2010 – Preparation of data from the new SOEP consumption module: Editing, imputation, and smoothing

13) SOEP Scales Manual (updated for SOEP-Core v32.1)

14) Kognitionspotenziale Jugendlicher - Ergänzung zum Jugendfragebogen der Längsschnittstudie Sozio-oekonomisches Panel (SOEP)

15) Die Vercodung der offenen Angaben zur beruflichen Tätigkeit nach der International Standard Classification of Occupations 2008 (ISCO08) - Direktvercodung - Vorgehensweise und Entscheidungsregeln bei nicht eindeutigen Angaben

16) Die Vercodung der offenen Angaben zur beruflichen Tätigkeit nach der Klassifikation der Berufe 2010 (KldB 2010): Vorgehensweise und Entscheidungsregeln bei nicht eindeutigen Angaben

17) Multi-Itemskalen im SOEP Jugendfragebogen

18) Zur Erhebung des adaptiven Verhaltens von zwei- und dreijährigen Kindern im Sozio-oekonomischen Panel (SOEP)

19) Documentation of ISCED Generation Based on the CAMCES Tool in the IAB-SOEP Migration Samples M1/M2 and IAB-BAMF-SOEP Survey of Refugees M3/M4 until 2017

20) Missing Income Data in the German SOEP: Incidence, Imputation and its Impact on the Income Distribution

21) SOEP 2006 – TIMEPREF: Dataset on the Economic Behavior Experiment on Time Preferences in the 2006 SOEP Survey

22) Assessing the distributional impact of "imputed rent" and "non-cash employee income" in microdata : Case studies based on EU-SILC (2004) and SOEP (2002)

All documentation for filtering can be found on this page

keyboard_arrow_up