SOEP-Core v31 - Dataset Information

The German Socio-Economic Panel (SOEP) study is a wide-ranging representative longitudinal study of private households, located at the German Institute for Economic Research, DIW Berlin. Every year, there were nearly 11,000 households, and more than 20,000 persons sampled by the fieldwork organization TNS Infratest Sozialforschung. The data provide information on all household members, consisting of Germans living in the Old and New German States, foreigners, and recent immigrants to Germany. The Panel was started in 1984. Some of the many topics include household composition, occupational biographies, employment, earnings, health and satisfaction indicators. As early as June 1990—even before the Economic, Social and Monetary Union—SOEP expanded to include the states of the former German Democratic Republic (GDR), thus seizing the rare opportunity to observe the transformation of an entire society. An immigrant sample was added as well to account for the changes that took place in German society in 1994/95. Further new samples were added in 1998, 2000, 2002, 2006, 2009, 2011, 2012 and 2013. The survey is constantly being adapted and developed in response to current social developments. The international version contains 95% of all cases surveyed.

Dataset Information

Titel: German Socio-Economic Panel (SOEP), data of the years 1984-2014

DOI: 10.5684/soep.v31
Collection period: 1984-2014
Publication date: November 15, 2015
Principal investigators: Jürgen Schupp, Jan Goebel, Martin Kroh, Carsten Schröder, Klaudia Erhardt, Alexandra Fedorets, Marco Giesselmann, Markus Grabka, Peter Krause, Simon Kühne, Maximilian Priem, David Richter, Rainer Siegers, Paul Schmelzer, Christian Schmitt, Daniel Schnitzlein, Ingrid Tucci, Knut Wenzig

Data collector: TNS Infratest Sozialforschung GmbH.

Population: Persons living in private households in Germany.

Selection method: All samples of SOEP are multi-stage random samples which are regionally clustered. The respondents (households) are selected by random-walk.

Collection mode: The interview methodology of the SOEP is based on a set of pre-tested questionnaires for households and individuals. Principally an interviewer tries to obtain face-to-face interviews with all members of a given survey household aged 16 years and over. Additionally one person (head of household) is asked to answer a household related questionnaire covering information on housing, housing costs, and different sources of income. This covers also some questions on children in the household up to 16 years of age, mainly concerning attendance at institutions (kindergarten, elementary school, etc.).

Data set information:

Number of units 109,911
Number of variables 51,203 in 365 data sets
Data format STATA, SPSS, SAS, CSV

MD5 fingerprints

Distribution Format
zip file
all files
Stata bilingual d2941d8149afa7a8810e8003c59771ad   | TXT, 15.95 KB
Stata German 814032bdcdd2c8b0602f32cff045b294   | TXT, 15.95 KB
Stata English d1a2d2b5ba69105f7cf7cc6a2c6ca229   | TXT, 15.95 KB
SPSS German 732e9dd01798f9ddf0d2f6435cdfb818   | TXT, 15.95 KB
SPSS English b240383e2c45247d0ededff519484438   | TXT, 15.95 KB
SAS German 3915948e00494a59df0a77f4d99cd400   | TXT, 17.83 KB
SAS English fb13d618ced783384d4796f0e8dc6d2d   | TXT, 17.83 KB
CSV 54cc73b250804d362f1d1c86f21eddc2   | TXT, 15.95 KB
GGKBOU German 2ba2f7ab801dc6dd6d1000b3603d1ecc   | TXT, 140 Byte
GGKBOU English c680879d0e382f60cb4f0f65092c4d1a   | TXT, 140 Byte
Teaching version
Stata German (teaching) 6f820e78e0bd16d47aa79de97e615f78   | TXT, 15.95 KB
Stata English (teaching) debcfe71b12527962f3f3db41ae091ee   | TXT, 15.95 KB
SPSS German (teaching) 61daf2301a20c5b354ef0d18732339c8   | TXT, 15.95 KB
SPSS English (teaching) 987ba29cda3169842a2bd093c2cf19fa   | TXT, 15.95 KB
SAS German (teaching) 44b8bd0c351f811650d44595b61752c4   | TXT, 17.78 KB
SAS English (teaching) 4fb88bd0a304719699a98c3a9fd9f8b6   | TXT, 17.78 KB

Publications:

  • Gert G. Wagner, Joachim R. Frick, and Jürgen Schupp (2007) The German Socio-Economic Panel Study (SOEP) - Scope, Evolution and Enhancements, Schmollers Jahrbuch (Journal of Applied Social Science Studies), 127 (1), 139-169 (download).
  • Schupp, Jürgen (2009): 25 Jahre Sozio-oekonomisches Panel - Ein Infrastrukturprojekt der empirischen Sozial- und Wirtschaftsforschung in Deutschland, Zeitschrift für Soziologie 38 (5),  350-357 (download).
  • Gert G. Wagner, Jan Göbel, Peter Krause, Rainer Pischner, and Ingo Sieber (2008) Das Sozio-oekonomische Panel (SOEP): Multidisziplinäres Haushaltspanel und Kohortenstudie für Deutschland - Eine Einführung (für neue Datennutzer) mit einem Ausblick (für erfahrene Anwender), AStA Wirtschafts- und Sozialstatistisches Archiv 2 (4), 301-328 (download).

SOEP v31 (original dataset)

SOEP v31i (international version)

SOEP v31.1 (update)

SOEP v31.1i (update international version)

1. Integration of the FiD study (data from 2010 ongoing)

We are pleased to announce that the data release v31 will include the data from “Familien in Deutschland” (Families in Germany, FiD) which is being retrospectively integrated into the SOEP and made available in user-friendly form to all SOEP users. The survey has been carried out in parallel to the SOEP as a so-called “SOEP-related study” from 2010 to 2013.

The original SOEP-related study FiD

The idea of FiD was to evaluate the full range of public benefits in Germany for married people and families on behalf of the Federal Ministry for Family Affairs. The datasets available—including the SOEP—were not sufficient for differentiated analysis of the segments of the population targeted by family policies. Particularly problematic were the very small percentages of single parents, families with more than two children, low-income families, and families with very young children in the German population. These groups are of course included in the SOEP, but the number of observations is too small for sound statistical analysis.

Since 2010, the SOEP Research Infrastructure at DIW Berlin has been working in collaboration with TNS Infratest Sozialforschung to survey more than 4,500 households every year. The FiD sample consists of the following subsamples:

  • A sample of families in “critical income brackets”
  • A sample of single parents
  • A sample of families with more than two children
  • “Cohort samples” of the 2007, 2008, 2009, and 2010 (first quarter) birth cohorts.

A description of the original FiD study can be found in article “Familien in Deutschland – FiD” by Mathis Schröder, Rainer Siegers, and C. Katharina Spieß, Schmollers Jahrbuch 133 (4), 2013, 595-606. (http://dx.doi.org/10.3790/schm.133.4.595). (Pre-published 2013: SOEPpapers 556 (PDF, 160.15 KB). Berlin: DIW Berlin).

Integration into SOEP-Core

Starting with Version 31 of the data, the FiD sample will be integrated completely into the SOEP-Core data—that is, as if it were a new sample drawn as part of SOEP-Core in 2010 and 2011. The integration of the FiD sample will result in a significant increase by almost one-third in the number of cases in SOEP-Core since 2010. The figure shows how the new FID samples L1 to L3 have affected cross-sectional sample size since 2010. The retrospective integration meant that the sample variables had to be adjusted as other subsamples have been added to SOEP-Core since 2010 (see adjustment of the sample variables).

Stichprobenentwicklung

In total, 14,166 variables from 64 datasets have been integrated into the various SOEP datasets, and the generated data sets or variables have been adjusted. Variables in the FiD survey instruments that were not contained in the corresponding SOEP survey instruments have been included in the respective datasets as additional variables (with the original FiD variable names starting with “fyy”, where “yy” is a two-digit year identifier). The table below gives an overview of the number of variables in each of the two main questionnaires that could be integrated.

Year Individual questionnaire (–p)
Number of variables integrated
Household questionnaire (–h)
Number of variables integrated
2010 314 274
2011 472 172
2012 350 188
2013 363 169

This means that from 2010 on, SOEP users have more cases in their study population—automatically, as it were—without having to make any changes in scripts. Of course, it may be that certain variables were not collected in FiD and are therefore unavailable for these cases. Here, please refer to our conventional approach to missings, which makes this easy to see on the variable level:

Code Meaning
-1 no answer / don’t know
-2 does not apply
-3 implausible value
-4 Inadmissible multiple response
-5 Not included in this version of the questionnaire
-6 Version of questionnaire with modified filtering
-8 Question not part of the survey program this year*

*Only applicable for datasets in long format.

2. Cross-sectional weights 2014

The Federal Statistical Office plans to adjust the already-released Microcensus data from 2011 and 2012 based on the 2011 census data. This means that in the present SOEP data release (v31), the weights for waves BB and BC will change due to the adjustment to the 2011 census data.

Because v31 will include the data from the SOEP-related study FiD, the integration of these households into the SOEP will increase the overall case number by around one-third and it will also affect the integrated weighting variables. This is due to the additional households as well as to the differentiated consideration of official information on family types in the weighting process. To allow users to test how a new sample may affect their research using the SOEP data, we provide both integrated weights and also separate weights for the old and new samples in the year when a refresher sample was integrated into the SOEP.

3. Other changes

3.1 Adjustment of the psample / hsample variables

Due to the retrospective integration of the FiD sample, the psample variable in ppfad and the corresponding hsample variable hpfad had to be adjusted.

sample variables

Value Old Labels (v30) New Label (v31)
1 A German West A Original Sample (DE-West)
2 B Foreigner West B Migration (up to 1983, DE-West)
3 C German East C Original Sample (DE-East)
4 D 84-93 Immigrant (West) D 1994/5 Migration (1984-92/94 DE-West)
5 E Refreshment 1998 E 1998 Refreshment
6 F ISOEP 2000 F 2000 Refreshment
7 G High-Income Test 2002 G 2002 High-Income
8 H Refreshment 2006 H 2006 Refreshment
9 I Incentives 2009 I 2009 Incentivization
10 J Refreshment 2011 J 2011 Refreshment
11 K Refreshment 2012 K 2012 Refreshment
12 L1 2010 Birth Cohorts (2007-2009)
13 M Migration 2013 L2 2010 Family Types
14 L3 2011 Family Types
15 M1 2013 Migration (1995-2010)




3.2. Biographical data sets

The following datasets with biographical information were pooled to keep the number of life-courserelated datasets to a reasonable level:

biobirth and biobirthm -> biobirth
Women’s (biobirth) and men’s (biobirthm) childbirth biographies are merged into the dataset biobirth as of v31, of course along with a gender variable.

bioage01 to bioage12 -> bioagel
Starting with data distribution v31, the age-specific data from the mother/parent-child questionnaires are provided only in the user-friendly “long” format: Rather than as age-specific individual files (e.g., bioage01, bioage03, ...), all mother-child and parent-child questionnaires are now pooled in the bioagel dataset. Consequently, all information on children can now easily be found in one dataset. The documentation on the biographical data includes a syntax to generate the age-specific individual files for those who do need them and information on how to use the new bioagel “long” data set most efficiently with SPSS and Stata.

The dataset bioage17 derived from the youth questionnaire is not included in this bioagel dataset.

3.3 Changes in $HGEN

The file HGEN v31.1 now contains the variable gas$$, which states the household’s gas costs starting in 2014. The variables $$eqplif and $$eqpnrj have now been carried forward from the last two years if a household did not provide a response in a given year.

3.4 Other changes in SOEP v31.1

The updates in v31.1 only affected the values of various variables. For detailed information please see doi soep.v31.1

1984-2014 (Wave BE)

June 6, 2016

In the file with generated longitudinal data on children (KIDLONG) in SOEP-Core v31.1 another correction had to be implemented: Some few data that only had been asked in the FiD study were missing.

This applies to the variables KA06$$ (Activities for children below the age of 6) and KA16$$ (Activities for children aged 6 to 16).

If you want to analyze these variables there are three ways to use the corrected data:

  1. You use the original data from SOEP-Core (in the files $$KIND)
  2. You use the data set KIDL in SOEPlong (there the data had been implemented correctely)
  3. You may ask for the correct data set KIDLONG at our hotline (soepmail@diw.de). We can provide you with an individualized download link.
March 18, 2016 Various updates forced us to distribute a new version. Please see the doi landing page soep.v31.1 for the documentation of the changes.


Individual (PAPI) 2014: Field-de
Household (PAPI) 2014: Field-de
Biography (PAPI) 2014: Field-de
Youth (16-17 year-olds) 2014: Field-de
Pre-Teen (11-12 year-olds) 2015: Field-de
Mother and Child (Newborns) 2014: Field-de
Mother and Child (2-3-year-olds) 2014: Field-de
Mother and Child (5-6-year-olds) 2014: Field-de
Parents and Child (7-8-year-olds) 2014: Field-de
Mother and Child (9-10-year-olds) 2014: Field-de
Deceased Individual 2014: Field-de

Please find all sample specific questionnaires of this year and all questionnaires of previous years on this site

1) SOEP 2014 – Documentation on Biography and Life History Data for SOEP v31 and v31.1

2) Documentation of Sample Sizes and Panel Attrition in the German Socio Economic Panel (SOEP) (1984 until 2014)

3) SOEP 2014 – Documentation of the Person-Related Meta-Dataset PPFAD for SOEP v31

4) SOEP 2014 – Documentation of the Person-related Meta-dataset PPFAD for SOEP v31.1

5) SOEP 2014 – Documentation of the Household-Related Meta-Dataset HPFAD for SOEP v31

6) SOEP 2014 – Documentation of the Household-related Meta-dataset HPFAD for SOEP v31.1

7) SOEP 2014 – Documentation of Person-Related Status and Generated Variables in PGEN for SOEP v31

8) SOEP 2014 – Documentation of Person-related Status and Generated Variables in $PGEN for SOEP v31.1

9) SOEP 2014 – Documentation of Household-Related Status and Generated Variables in HGEN for SOEP v31

10) SOEP 2014 – Documentation of Household-related Status and Generated Variables in $HGEN for SOEP v31.1

11) SOEP 2014 – Codebook for the $PEQUIV File 1984-2014: CNEF Variables with Extended Income Information for the SOEP

12) SOEP 2014 – Documentation of the Person-Related Meta-Dataset HEALTH for SOEP v31

13) SOEP 2014 – Documentation of the Person-related Meta-dataset HEALTH for SOEP v31.1

14) SOEP 2014 – Documentation of Person-related Variables on Children in BEKIND for SOEP v31.1

15) SOEP 2014 – Documentation of the Pooled Dataset on Children in KIDLONG for SOEP v31.1

16) SOEP 2014 – Documentation of the Dataset INTERVIEWER: Detailed Information on SOEP Interviewers for SOEP v31

1) Handgreifkraftmessung im Sozio-oekonomischen Panel (SOEP) 2006 und 2008

2) Documentation on ISCED Generation Using the CAMCES Tool in the IAB-SOEP Migration Samples M1/M2

3) The new IAB-SOEP Migration Sample: an introduction into the methodology and the contents

4) The Request for Record Linkage in the IAB-SOEP Migration Sample

5) Flowcharts for the Integrated Individual-Biography Questionnaire of the IAB-SOEP Migration Sample 2013

6) The Measurement of Labor Market Entries with SOEP Data: Introduction to the Variable EINSTIEG_ARTK

7) Job submission instructions for the SOEPremote System at DIW Berlin – Update 2014

8) SOEP 2015 – Informationen zu den SOEP-Geocodes in SOEP v32

9) Editing and Multiple Imputation of Item Non-response in the Wealth Module of the German Socio-Economic Panel

10) Die Vercodung der offenen Angaben zu den Ausbildungsberufen im Sozio-Oekonomischen Panel

11) Das Studiendesign der IAB-BAMF-SOEP Befragung von Geflüchteten

12) Scales Manual IAB-BAMF-SOEP Survey of Refugees in Germany – revised version

13) SOEP 2010 – Preparation of data from the new SOEP consumption module: Editing, imputation, and smoothing

14) SOEP Scales Manual (updated for SOEP-Core v32.1)

15) Kognitionspotenziale Jugendlicher - Ergänzung zum Jugendfragebogen der Längsschnittstudie Sozio-oekonomisches Panel (SOEP)

16) Die Vercodung der offenen Angaben zur beruflichen Tätigkeit nach der International Standard Classification of Occupations 2008 (ISCO08) - Direktvercodung - Vorgehensweise und Entscheidungsregeln bei nicht eindeutigen Angaben

17) Die Vercodung der offenen Angaben zur beruflichen Tätigkeit nach der Klassifikation der Berufe 2010 (KldB 2010): Vorgehensweise und Entscheidungsregeln bei nicht eindeutigen Angaben

18) Multi-Itemskalen im SOEP Jugendfragebogen

19) Zur Erhebung des adaptiven Verhaltens von zwei- und dreijährigen Kindern im Sozio-oekonomischen Panel (SOEP)

20) Documentation of ISCED Generation Based on the CAMCES Tool in the IAB-SOEP Migration Samples M1/M2 and IAB-BAMF-SOEP Survey of Refugees M3/M4 until 2017

21) Missing Income Data in the German SOEP: Incidence, Imputation and its Impact on the Income Distribution

22) SOEP 2013 – Documentation of Generated Person-Level Long-Term Care Variables in PFLEGE

23) SOEP-Core v34 – PFLEGE: Documentation of Generated Person-level Long-term Care Variables

24) SOEP 2006 – TIMEPREF: Dataset on the Economic Behavior Experiment on Time Preferences in the 2006 SOEP Survey

25) SOEP-Core v34: Codebook for the EU-SILC-Like Panel for Germany Based on the SOEP

26) Assessing the distributional impact of "imputed rent" and "non-cash employee income" in microdata : Case studies based on EU-SILC (2004) and SOEP (2002)

Alle Dokumentationen zum Filtern finden Sie auf dieser Seite