SOEP-Core v32 (data 1984-2015)

The German Socio-Economic Panel (SOEP) study is a wide-ranging representative longitudinal study of private households, located at the German Institute for Economic Research, DIW Berlin. Every year, there were nearly 15,000 households, and more than 25,000 persons sampled by the fieldwork organization TNS Infratest Sozialforschung. The data provide information on all household members, consisting of Germans living in the Eastern and Western German States, foreigners, and immigrants to Germany. The Panel was started in 1984. Some of the many topics include household composition, occupational biographies, employment, earnings, health and satisfaction indicators. As early as June 1990—even before the Economic, Social and Monetary Union—SOEP expanded to include the states of the former German Democratic Republic (GDR), thus seizing the rare opportunity to observe the transformation of an entire society. Also immigrant samples were added in 1994/95 and 2013/2015 to account for the changes that took place in Germany society. Further new samples were added in 1998, 2000, 2002, 2006, 2009, 2010, 2011, and 2012. Since Version 31 (10.5684/soep.v31) the SOEP includes the complete data from “Familien in Deutschland” (Families in Germany, FiD) which has been retrospectively integrated into the SOEP and made available in user-friendly form to all SOEP users. The FiD survey has been carried out in parallel to the SOEP as a so-called “SOEP-related study” from 2010 to 2013. The survey is constantly being adapted and developed in response to current social developments. The international version contains 95% of all cases surveyed (see 10.5684/soep.v32i).

Dataset Information

Title: Socio-Economic Panel (SOEP), data from 1984-2015

DOI: 10.5684/soep.v32
Collection period: 1984-2015
Publication date: December 14, 2016
Principal investigators: Jürgen Schupp, Jan Goebel, Martin Kroh, Carsten Schröder, Charlotte Bartels, Klaudia Erhardt, Alexandra Fedorets, Marco Giesselmann, Markus Grabka, Peter Krause, Simon Kühne, David Richter, Diana Schacht, Paul Schmelzer, Christian Schmitt, Daniel Schnitzlein, Rainer Siegers, Knut Wenzig

Data collector: TNS Infratest Sozialforschung GmbH.

Population: Persons living in private households in Germany.

Selection method: All samples of SOEP are multi-stage random samples which are regionally clustered. The respondents (households) are selected by random-walk. 

Collection mode: The interview methodology of the SOEP is based on a set of pre-tested questionnaires for households and individuals. Principally an interviewer tries to obtain face-to-face interviews with all members of a given survey household aged 16 years and over. Additionally one person (head of household) is asked to answer a household related questionnaire covering information on housing, housing costs, and different sources of income. This covers also some questions on children in the household up to 16 years of age, mainly concerning attendance at institutions (kindergarten, elementary school, etc.)

Data set information:

 Number of units 113,840
 Number of variables 61,902 in 413 data sets
 Data format STATA, SPSS, SAS, CSV

SOEP-Core soep.v32.1

  • BIOCOUPLY and BIOMARSY:  By mistake in the first version of the data delivery wrong data were uploaded for the two datasets. This version contains the correct datasets .
  • NACE in BFP and BFPGEN: A user reported implausible values for the variables BFP55_NACE and NACE15 containing information on the current job's industry. In this version  the information is updated after a bug in the script has been fixed.
  • Scale shift in BFP: In the v32 data release, the scales in BFP on the probability of specific events occurring in working life, which in previous years had been coded from 0-100 at 10-point intervals, were given on a scale from 0-10 for the CAPI and CAWI interviews. This inconsistency was corrected in the update adapting the scales to the previously used coding: scales from bfp4201, bfp4202, bfp4203, bfp7201, bfp7202, and bfp7203 were multiplied by 10 where bfpinta = 9 or 10; also, one case in  bfp7201 was changed from 4 to 40 where bfpinta = 8.
  • einstieg_artk and einstieg_pbio: SOEP has offered two additional labor market entry variables since providing data version 32 as part of the BIOJOB file. They were constructed on the basis of employment history information to the exact year and month. They refer to a generic uniform definition of the first survey period after the transition from the educational system to the labor market. The construction details for these variables are documented in detail in the SOEP Survey Paper 429, a short version of the description is also available in the BIOJOB documentation. (SOEP Survey Paper 418)

SOEP-Core soep.v32

The new data release (1984–2015) "SOEP.v32" provides, for the most recent survey year 2015, the usual wave-specific data files BFPBRUTTO, BFP, BFPEQUIV, BFP_MIG, BFPKAL, BFPGEN, BFPAGE17, BFHBRUTTO, BFH, BFHGEN, BFKIND, and BEPLUECKE as well as the updated files with a longitudinal component  (PFAD files, biography files, spell data, and weighting factors).

1. New migrant subsample (M2)

In 2013, we conducted the first IAB-SOEP Migration Sample in partnership with the Institute for Employment Research (IAB) in Nuremberg (for an overview of M1, see SOEP Survey Paper 216). The households from the second IAB-SOEP Migration Sample surveyed in 2015 are now also included in the SOEP data. The target population of the second IAB-SOEP Migration Sample consists of immigrants to Germany who have arrived between 2010 and 2013. Migrants from the new EU member states in Eastern Europe dominate this group. This focus will make it possible to better describe the dynamic recent evolution of immigration to Germany. The sample M2 consists of 1,096 households, and was, like sample M1, drawn from register data from the Federal Employment Agency.

Record Linkage

Please note that data from both samples can be linked with administrative employment and income data: Survey respondents are asked to provide explicit consent to record linkage. But since this linked dataset contains social data, these weakly anonymized data are only accessible on site at the Research Data Center of the German Federal Employment Agency at the IAB (FDZ IAB). Researchers can access FDZ IAB data through a guest visit to the IAB or through remote data processing, also arranged with the IAB. The linked data will soon be available to external researchers. Requests for data access should be directed to FDZ IAB, since a contract with IAB for data use is required.

For more information, see the FDZ IAB website.

2. Weighting

  • In version v32 of the SOEP data, the new migrant subsample, M2, has been integrated into the SOEP weighting framework. As is our usual practice when a new sample is integrated into the SOEP, we make different weighting factors available for the first wave. The standard weights (bfhhrf/bfphrf) allow researchers to draw inferences about the underlying population of residents in Germany based on all SOEP samples. The variables bfhhrfam1/bfphrfam1 allow the same inferences, but only using data from the old Samples A to M1. Comparisons between both sets of weights thus enable researchers to gauge the influence of the recent enlargement of the SOEP for population estimates. Weights specific to the recent enlargement M2, bfhhrfm2/bphhrfm2, allow researchers to draw inferences about the target population of immigrants to Germany between 2010 and 2013.
  • The adjustment of weights to census margins on the individual level has been updated since 1984 so that now the number of women and men in each age group (five-year categories) is given as the margin. Up to now, two separate margins were used for sex and age group.
  • Upon request, we now provide weighting factors for survey years 2010 to 2013 (waves BA to BD) excluding Samples L1 to L3. Due to differences in survey instruments used with Samples L1 to L3 in the corresponding waves as part of the "Familien in Deutschland" (Families in Germany) survey,  a need for weighting may arise when variables are to be analyzed that were not surveyed in the other samples.

3. Changed datasets or variables

  • MIGSPELL: With the integration of the data from 2013 (BD) to 2015 (BF), larger changes in the number and coding of the MIGSPELL variables were necessary, since in particular the status upon entry to Germany was surveyed in the individual waves with differing degrees of specificity. In addition, an improved procedure was introduced for imputation of missing data. A detailed description of the new version of MIGSPELL can be found in the SOEP 2015 documentation on Biography and Life History Data (coming soon).
  • Variables connected to occupations:
    - The variables names have changed and should now be more informative; the name of the coding scheme is now part of the variable name, e.g., isco88.
    - The occupational codes (KldB92, ISCO-88) now comply better with official standards (e.g., variables with suffixes _kldb92 or _isco88 in $P files).
    - In $PGEN there are now also variables using the coding schemes for KldB2010 and ISCO-08.
    - The code for generating the derived prestige scales has been redesigned, e.g., egp88_12 for egp class based on ISCO-88 in the year 2012.
  • BIOIMMIG:  The variable biwfam ("Already Had Family In Country") was recoded incorrectly in the generated dataset for the migration samples in 2013 and 2014. This was corrected in the current data release.
  • Survey Year: With Version 32, variables referring to the survey year are referred to consistently as syear. Previously there were a few variables with names like erhebj and svyyear.

4. New datasets or variables

  • BIOIMMIG: Additional variable for the main reason for migrating to Germany (only available since 2014).
  • PFLEGE: A new variable, appraisal with the label: “officially assessed as in need of care”
  • $PEQUIV: six new variables:
        -  ichsu$$ Child support, caregiver alimony
        -  fchsu$$ Imputation flag child support, caregiver alimony
        -  ispou$$ Divorce alimony
        -  fspou$$ Imputation flag Divorce alimony
        -  irie1$$ Riester pension plan
        -  irie2$$ Riester widow pension plan


  • PPFAD: Person-related meta dataset
    -  Some immigration variables (GERMBORN, CORIGIN and IMMIYEAR) previously contained a -3 for all respondents in Sample G who were not asked to state their country of birth and year of immigration. Since respondents from other samples (e.g. A) were also not directly asked to provide this information and were coded -2, the coding of missing values was not consistent across samples. This inconsistency was corrected in the new update (v32).
    -   Respondents who immigrated in the year 1949 (when the Federal Republic of Germany was founded) were previously considered not to have been born in Germany due to a coding error. This has been fixed in the updated version, and now, in accordance with the German Microcensus, all persons who immigrated before 1950 (after 1949) are considered to have been born in Germany. This also led to a change in the value label of IMMIYEAR.
    -   More information was considered in the updated version of MIGINFO, leading to changes in the values.

