SOEP-Core v31 (data 1984-2014)

The German Socio-Economic Panel (SOEP) study is a wide-ranging representative longitudinal study of private households, located at the German Institute for Economic Research, DIW Berlin. Every year, there were nearly 11,000 households, and more than 20,000 persons sampled by the fieldwork organization TNS Infratest Sozialforschung. The data provide information on all household members, consisting of Germans living in the Old and New German States, foreigners, and recent immigrants to Germany. The Panel was started in 1984. Some of the many topics include household composition, occupational biographies, employment, earnings, health and satisfaction indicators. As early as June 1990—even before the Economic, Social and Monetary Union—SOEP expanded to include the states of the former German Democratic Republic (GDR), thus seizing the rare opportunity to observe the transformation of an entire society. An immigrant sample was added as well to account for the changes that took place in German society in 1994/95. Further new samples were added in 1998, 2000, 2002, 2006, 2009, 2011, 2012 and 2013. The survey is constantly being adapted and developed in response to current social developments. The international version contains 95% of all cases surveyed.

Dataset Information

Titel: German Socio-Economic Panel (SOEP), data of the years 1984-2014

DOI: 10.5684/soep.v31
Collection period: 1984-2014
Publication date: November 15, 2015
Principal investigators: Jürgen Schupp, Jan Goebel, Martin Kroh, Carsten Schröder, Klaudia Erhardt, Alexandra Fedorets, Marco Giesselmann, Markus Grabka, Peter Krause, Simon Kühne, Maximilian Priem, David Richter, Rainer Siegers, Paul Schmelzer, Christian Schmitt, Daniel Schnitzlein, Ingrid Tucci, Knut Wenzig

Data collector: TNS Infratest Sozialforschung GmbH.

Population: Persons living in private households in Germany.

Selection method: All samples of SOEP are multi-stage random samples which are regionally clustered. The respondents (households) are selected by random-walk.

Collection mode: The interview methodology of the SOEP is based on a set of pre-tested questionnaires for households and individuals. Principally an interviewer tries to obtain face-to-face interviews with all members of a given survey household aged 16 years and over. Additionally one person (head of household) is asked to answer a household related questionnaire covering information on housing, housing costs, and different sources of income. This covers also some questions on children in the household up to 16 years of age, mainly concerning attendance at institutions (kindergarten, elementary school, etc.).

Data set information:

Number of units 109,911
Number of variables 51,203 in 365 data sets
Data format STATA, SPSS, SAS, CSV

SOEP v31 (original dataset)

SOEP v31i (international version)

SOEP v31.1 (update)

SOEP v31.1i (update international version)

1. Integration of the FiD study (data from 2010 ongoing)

We are pleased to announce that the data release v31 will include the data from “Familien in Deutschland” (Families in Germany, FiD) which is being retrospectively integrated into the SOEP and made available in user-friendly form to all SOEP users. The survey has been carried out in parallel to the SOEP as a so-called “SOEP-related study” from 2010 to 2013.

The original SOEP-related study FiD

The idea of FiD was to evaluate the full range of public benefits in Germany for married people and families on behalf of the Federal Ministry for Family Affairs. The datasets available—including the SOEP—were not sufficient for differentiated analysis of the segments of the population targeted by family policies. Particularly problematic were the very small percentages of single parents, families with more than two children, low-income families, and families with very young children in the German population. These groups are of course included in the SOEP, but the number of observations is too small for sound statistical analysis.

Since 2010, the SOEP Research Infrastructure at DIW Berlin has been working in collaboration with TNS Infratest Sozialforschung to survey more than 4,500 households every year. The FiD sample consists of the following subsamples:

  • A sample of families in “critical income brackets”
  • A sample of single parents
  • A sample of families with more than two children
  • “Cohort samples” of the 2007, 2008, 2009, and 2010 (first quarter) birth cohorts.

A description of the original FiD study can be found in article “Familien in Deutschland – FiD” by Mathis Schröder, Rainer Siegers, and C. Katharina Spieß, Schmollers Jahrbuch 133 (4), 2013, 595-606. ( (Pre-published 2013: SOEPpapers 556 (PDF, 160.15 KB). Berlin: DIW Berlin).

Integration into SOEP-Core

Starting with Version 31 of the data, the FiD sample will be integrated completely into the SOEP-Core data—that is, as if it were a new sample drawn as part of SOEP-Core in 2010 and 2011. The integration of the FiD sample will result in a significant increase by almost one-third in the number of cases in SOEP-Core since 2010. The figure shows how the new FID samples L1 to L3 have affected cross-sectional sample size since 2010. The retrospective integration meant that the sample variables had to be adjusted as other subsamples have been added to SOEP-Core since 2010 (see adjustment of the sample variables).


In total, 14,166 variables from 64 datasets have been integrated into the various SOEP datasets, and the generated data sets or variables have been adjusted. Variables in the FiD survey instruments that were not contained in the corresponding SOEP survey instruments have been included in the respective datasets as additional variables (with the original FiD variable names starting with “fyy”, where “yy” is a two-digit year identifier). The table below gives an overview of the number of variables in each of the two main questionnaires that could be integrated.

Year Individual questionnaire (–p)
Number of variables integrated
Household questionnaire (–h)
Number of variables integrated
2010 314 274
2011 472 172
2012 350 188
2013 363 169

This means that from 2010 on, SOEP users have more cases in their study population—automatically, as it were—without having to make any changes in scripts. Of course, it may be that certain variables were not collected in FiD and are therefore unavailable for these cases. Here, please refer to our conventional approach to missings, which makes this easy to see on the variable level:

Code Meaning
-1 no answer / don’t know
-2 does not apply
-3 implausible value
-4 Inadmissible multiple response
-5 Not included in this version of the questionnaire
-6 Version of questionnaire with modified filtering
-8 Question not part of the survey program this year*

*Only applicable for datasets in long format.

2. Cross-sectional weights 2014

The Federal Statistical Office plans to adjust the already-released Microcensus data from 2011 and 2012 based on the 2011 census data. This means that in the present SOEP data release (v31), the weights for waves BB and BC will change due to the adjustment to the 2011 census data.

Because v31 will include the data from the SOEP-related study FiD, the integration of these households into the SOEP will increase the overall case number by around one-third and it will also affect the integrated weighting variables. This is due to the additional households as well as to the differentiated consideration of official information on family types in the weighting process. To allow users to test how a new sample may affect their research using the SOEP data, we provide both integrated weights and also separate weights for the old and new samples in the year when a refresher sample was integrated into the SOEP.

3. Other changes

3.1 Adjustment of the psample / hsample variables

Due to the retrospective integration of the FiD sample, the psample variable in ppfad and the corresponding hsample variable hpfad had to be adjusted.

sample variables

Value Old Labels (v30) New Label (v31)
1 A German West A Original Sample (DE-West)
2 B Foreigner West B Migration (up to 1983, DE-West)
3 C German East C Original Sample (DE-East)
4 D 84-93 Immigrant (West) D 1994/5 Migration (1984-92/94 DE-West)
5 E Refreshment 1998 E 1998 Refreshment
6 F ISOEP 2000 F 2000 Refreshment
7 G High-Income Test 2002 G 2002 High-Income
8 H Refreshment 2006 H 2006 Refreshment
9 I Incentives 2009 I 2009 Incentivization
10 J Refreshment 2011 J 2011 Refreshment
11 K Refreshment 2012 K 2012 Refreshment
12 L1 2010 Birth Cohorts (2007-2009)
13 M Migration 2013 L2 2010 Family Types
14 L3 2011 Family Types
15 M1 2013 Migration (1995-2010)

3.2. Biographical data sets

The following datasets with biographical information were pooled to keep the number of life-courserelated datasets to a reasonable level:

biobirth and biobirthm -> biobirth
Women’s (biobirth) and men’s (biobirthm) childbirth biographies are merged into the dataset biobirth as of v31, of course along with a gender variable.

bioage01 to bioage12 -> bioagel
Starting with data distribution v31, the age-specific data from the mother/parent-child questionnaires are provided only in the user-friendly “long” format: Rather than as age-specific individual files (e.g., bioage01, bioage03, ...), all mother-child and parent-child questionnaires are now pooled in the bioagel dataset. Consequently, all information on children can now easily be found in one dataset. The documentation on the biographical data includes a syntax to generate the age-specific individual files for those who do need them and information on how to use the new bioagel “long” data set most efficiently with SPSS and Stata.

The dataset bioage17 derived from the youth questionnaire is not included in this bioagel dataset.

3.3 Changes in $HGEN

The file HGEN v31.1 now contains the variable gas$$, which states the household’s gas costs starting in 2014. The variables $$eqplif and $$eqpnrj have now been carried forward from the last two years if a household did not provide a response in a given year.

3.4 Other changes in SOEP v31.1

The updates in v31.1 only affected the values of various variables. For detailed information please see doi soep.v31.1

1984-2014 (Wave BE)

June 6, 2016

In the file with generated longitudinal data on children (KIDLONG) in SOEP-Core v31.1 another correction had to be implemented: Some few data that only had been asked in the FiD study were missing.

This applies to the variables KA06$$ (Activities for children below the age of 6) and KA16$$ (Activities for children aged 6 to 16).

If you want to analyze these variables there are three ways to use the corrected data:

  1. You use the original data from SOEP-Core (in the files $$KIND)
  2. You use the data set KIDL in SOEPlong (there the data had been implemented correctely)
  3. You may ask for the correct data set KIDLONG at our hotline ( We can provide you with an individualized download link.
March 18, 2016 Various updates forced us to distribute a new version. Please see the doi landing page soep.v31.1 for the documentation of the changes.

