SOEP-Core v36eu (data 1984-2019, EU Edition)

The Socio-Economic Panel (SOEP) is a representative, multi-cohort survey that has been running since 1984. Every year, individuals in households throughout Germany are surveyed by our survey institute on behalf of DIW Berlin. These respondents provide information on topics such as their income, employment history, education, and health. Because the same people are surveyed every year, it is possible to track long-term psychological, economic, societal, and social developments. To keep pace with changes in society, random samples are added regularly and the survey is adapted accordingly.

Dataset Information

Title: Socio-Economic Panel (SOEP), data from 1984-2019, EU Edition

DOI info : 10.5684/soep.core.v36eu
Collection period: 1984-2019
Publication date: 2021-03-31
Principal investigators: Stefan Liebig, Jan Goebel, Markus Grabka, Carsten Schröder, Sabine Zinn, Charlotte Bartels, Alexandra Fedorets, Andreas Franken, Martin Gerike, Florian Griese, Jannes Jacobsen, Selin Kara, Johannes König, Peter Krause, Hannes Kröger, Elisabeth Liebau, Maria Metzing, Jana Nebelin, Marvin Petrenz, David Richter, Paul Schmelzer, Christian Schmitt, Jürgen Schupp, Daniel Schnitzlein, Rainer Siegers, Hans Walter Steinhauer, Knut Wenzig, Stefan Zimmermann

Contributor: Kantar Deutschland GmbH (Data Collector)

Population: Persons living in private households in Germany

Amount of households: 19.032

Amount of individuals: 32.050 + 3476 Children

Special samples: Citizens of the GDR (1990), Immigration/Migration (1994/95, 2013, 2015), Refugees (since 2016). See the chapter SOEP-Samples in Detail on the SOEPcompanion for a description of all our samples.

Selection method: All samples of SOEP are multi-stage random samples which are regionally clustered. The respondents (households) are selected by random-walk or register sample.

Collection Mode:The interview methodology of the SOEP is based on a set of pre-tested questionnaires for households and individuals. Principally an interviewer tries to obtain face-to-face interviews with all members of a given survey household aged 12 years and over. Additionally one person (head of household) is asked to answer a household related questionnaire covering information on housing, housing costs, and different sources of income. This covers also some questions on children in the household up to 12 years of age, mainly concerning attendance at institutions (kindergarten, elementary school, etc.)

Citation of the Data Set: Socio-Economic Panel (SOEP), data for years 1984-2019, SOEP-Core v36, EU Edition, 2021, doi:10.5684/soep.core.v36eu

If you don‘t exclude observations from the Migration Samples in your analysis, please also cite as follows:
IAB-SOEP Migration Samples (M1, M2), data of the years 2013-2019, DOI: 10.5684/soep.iab-soep-mig.2019

If you don‘t exclude observations from the Refugee Samples in your analysis, please also cite as follows:
IAB-BAMF-SOEP Survey of Refugees (M3-M5), data of the years 2016-2019, DOI: 10.5684/soep.iab-bamf-soep-mig.2019

SOEP-Core - Reference Articles

Publications using this file should refer to the above DOI infoFind an explanation on the usage of DOI here.and cite following references

Goebel, Jan, Markus M. Grabka, Stefan Liebig, Martin Kroh, David Richter, Carsten Schröder, and Jürgen Schupp. 2019. The German Socio-Economic Panel (SOEP). Jahrbücher für Nationalökonomie und Statistik (Journal of Economics and Statistics) 239 (2), 345-360. (https://doi.org/10.1515/jbnst-2018-0022)

If you do not exclude the cases of the migration samples in your analysis, then please also cite the following reference:

Herbert Brücker, Martin Kroh, Simone Bartsch, Jan Goebel, Simon Kühne, Elisabeth Liebau, Parvati Trübswetter, Ingrid Tucci & Jürgen Schupp. 2014. The new IAB-SOEP Migration Sample: an introduction into the methodology and the contents. SOEP Survey Paper 216 (PDF, 444.25 KB), Series C. Berlin, Nürnberg: DIW Berlin.

If you do not exclude the cases of the refugee samples in your analysis, please also cite:

Herbert Brücker, Yuliya Kosyakova, Nina Rother, Sabine Zinn, Elisabeth Liebau, Wenke Gider, Silvia Schwanhäuser, & Manuel Siegert. 2025. Exploring Integration and Migration Dynamics: The Research Potentials of a Large-Scale Longitudinal Household Study of Refugees in Germany. European Sociological Review. https://doi.org/10.1093/esr/jcaf032.

If you use data from the SOEP-LEE2 surveys, please also cite:

Wenzel Matiaske, Torben Dall Schmidt, Christoph Halbmeier, Martina Maas, Doris Holtmann, Carsten Schröder, Tamara Böhm, Stefan Liebig, and Alexander S. Kritikos. 2023. SOEP-LEE2 : Linking Surveys on Employees to Employers in Germany. Jahrbücher für Nationalökonomie und Statistik Data Observer, 1–14. https://doi.org/10.1515/jbnst-2023-0031.

If you would like to refer more specifically, please also cite:

Schröder, Carsten, Johannes König, Alexandra Fedorets, Jan Goebel, Markus M. Grabka, Holger Lüthen, Maria Metzing, Felicitas Schikora, and Stefan Liebig. 2020. The economic research potentials of the German Socio-Economic Panel study. German Economic Review 21 (3), 335-371. (https://doi.org/10.1515/ger-2020-0033)
Giesselmann, Marco, Sandra Bohmann, Jan Goebel, Peter Krause, Elisabeth Liebau, David Richter, Diana Schacht, Carsten Schröder, Jürgen Schupp, and Stefan Liebig. 2019. The Individual in Context(s): Research Potentials of the Socio-Economic Panel Study (SOEP) in Sociology. European Sociological Review 35 (5), 738-755. (https://doi.org/10.1093/esr/jcz029)
Jacobsen, Jannes, Magdalena Krieger, Felicitas Schikora, and Jürgen Schupp. 2021. Growing Potentials for Migration Research using the German Socio-Economic Panel Study. Jahrbücher für Nationalökonomie und Statistik 241 (4), 527-549. (https://doi.org/10.1515/jbnst-2021-0001)
Fedorets, Alexandra, Stefan Kirchner, Jule Adriaans, and Oliver Giering. 2022. Data on Digital Transformation in the German Socio-Economic Panel. Jahrbücher für Nationalökonomie und Statistik 242 (5-6), 691-705. (https://doi.org/10.1515/jbnst-2021-0056)

SOEP-Core v36 - Further Editions

For the SOEP-Core data 1984-2019 (v36) - waves A to BJ - we provide the following editions:

soep.core.v36eu (EU Edition, 100%)

soep.core.v36i (International Scientific Use Version, 95%)

soep.core.v36t (Teaching Edition, 50%)

soep.core.v36at (Add-on: Area types)

soep.core.v36pr (Add-on: Planning regions)

soep,core.v36r (Remote Edition)

soep.core.v36o (Onsite Edition)

Find detailed information on the SOEPcompanion.

These datasets are included in SOEP v36, but is also available as individual data sets upon request:

soep.iab-soep-mig.2019 (Migration samples)

soep.iab-bamf-soep-mig.2019 (Refugee samples)

SOEP-Core v36 (2019) - Changes in the Dataset

New samples in the main SOEP study

New Sample P

“Top Shareholder Sample”: Sample P was conceptualized as a sample of highly affluent households in Germany. Against the backdrop of the increasing income and wealth inequality in Germany over recent decades, despite economic growth, there has been a growing need for data on wealthy populations in the social sciences. Sample P was created to improve the empirical base for the German government’s poverty and wealth report and to lay the foundation for medium- and long-term cross-sectional and longitudinal analysis. The gross sample consisted of 23,259 households.

New Sample Q

“LGB* Sample”: Sample Q is a boost sample of a hard-to-survey population: lesbians, gays, bisexuals, transgender people, and those who identify as non-binary. While the actual percentage of LGBTQ+ people in the general population is unknown, this group was too scarcely represented in the SOEP to allow for meaningful analysis. 835 households were recruited through an approximately 9-month-long telephone screening process. Of these households, 477 participated in the survey between April and November.

Changes in our new main data format, SOEPlong

Dataset BIOL - Variables on recognition of occupational qualifications in samples M3-M5 and the CAMCES module (identifiable in the variable label by the abbreviation AA/AAC) were corrected. The slightly different biographical questionnaire for samples M1-M2 is no longer used, and variables on migration history have been added to the SOEP-Core biographical questionnaire, which is now used for all samples. The variables have been integrated, versioned, and harmonized in biol accordingly. The religious affiliation of the father and mother has been reversioned and harmonized to include the response option “konfessionslos” (no religious affiliation). Additional variables with occupational codes have been added. Some variables at the federal state level were included as East-West variables with the suffix _ew. Since bioresid and biosoc will no longer be part of the data distribution, some data processing steps for the variables in these datasets have been included in the versioning and harmonization routines for biol.

Dataset PL - Variables on balance of assets (identifiable in variable label by abbreviation VB) have been corrected, re-sorted, and labeled. Inheritance variables have been re-versioned. Religious background has been re-versioned and harmonized to include the response option “konfessionslos” (no religious affiliation).

Dataset HBRUTTO - Some regional variables at the federal state level have been included as East-West variables with the suffix _ew. New variables on incentive type, incentive model, and variables describing screening process for the LGB sample have been added. Residential environment variables (wum) will no longer be included in the survey starting in 2019.

Dataset PBRUTTO - Some regional variables at the federal state level have been included as East-West variables with the suffix _ew. New variables on DRV record linkage and IAB record linkage have been added. Variables have been added indicating which questionnaire was used.

Dataset JUGENDL - Since bioage17 will no longer be part of the data distribution, some of the data processing steps for bioage17 variables have been included in the versioning and harmonization routines for jugendl.

Dataset KIDLONG - k_nrkid has been corrected to count only 16-year-old children in the household. Households with children without a stated birth year have been assigned a missing value of -1. - bgk93_r/kd_cty_r included incorrect values and has been corrected.

Dataset PLUECKEL - lpid has been removed.

Dataset HBRUTT - Some regional variables at the federal state level have been included as East-West variables with the suffix _ew.

Introducing SOEP Data Editions and a new missing value to manage restricted-access information

Due to changes in data protection and privacy law, variables containing information on Germany’s federal states (Bundesländer) may not be transmitted to recipients outside the European Union. We have developed a new concept with different editions for the different data access procedures resulting from the change in law (listed in ascending order by the amount of information contained in each edition):

Teaching Edition (50% sample, doi:10.5684/soep-core.v36t)
International Edition (95% sample, doi:10.5684/soep-core.v36i)
EU Edition (100% sample, doi:10.5684/soep-core.v36eu)
Area Types (add-on for EU Edition: classification of areas, 100% sample, doi:10.5684/soep-core.v36at)
Planning Regions (add-on for EU edition: 96 planning regions, doi:10.5684/soep-core.v36pr)
Remote Edition (available through remote execution including counties, doi:10.5684/soep-core.v36r)
On-Site Edition (available only on site including municipalities, zip codes, and geo-coordinates, doi:10.5684/soep-core.v36o)

The default edition that we transmit to European users by sending them a personalized download link is the EU Edition. Some datasets may not be available in more restricted editions. If variables are not available in a more restricted edition, they are recoded to -7, a new missing value labeled “only available in less restricted edition”.

New datasets

Dataset BIOREGION

A new dataset on places in Germany that are of biographical importance to respondents (place of birth, first place of residence).
Information about the federal state of these important places is includes in the EU data edition. More localized information (county or municipality) is only available remotely or on site.

Dataset BIORESIDREFING

A new dataset on refugees’ place(s) of residence in Germany (Wohnorthistorie).
Information about the federal state in which refugees reside is included in the EU data edition. More localized information (county or municipality) is only available remotely or on site.

Dataset MORE_DOCU

A new dataset on the Mentoring of Refugees (MORE) project. Carried out in partnership with Start with a Friend (SWAF), this project aimed at bringing refugees and locals together to form friendships. This dataset contains information on German contacts provided to refugees.
Information about the federal state of the SWAF location is includes in the EU data edition. More localized information (county or municipality) is only available remotely or on site.

Dataset MORE_LOCAL

A new dataset on the Mentoring of Refugees (MORE) project. Carried out in partnership with Start with a Friend (SWAF), this project aimed at bringing refugees and locals together to form friendships. This dataset contains information from the surveys of the locals in the project.

Changes in datasets and individual variables

1. REGIONL

The dataset REGIONL includes regional information relating to household address, such as the municipality size, county, or zip code. Because most variables included in this dataset are only available in more restricted data editions, these variables are included with "-7" for all cases. However, users without access to the more restricted version can at least see the variable definitions and the structure of the dataset.

2. BIOIMMIG

Originally, BIOIMMIG was generated by appending each new wave of data to the data from the previous years. This practice bears potential for errors since the SOEP includes a large number of variables that need to be comparable over time. In order to minimize this potential for error, v36 of BIOIMMIG is the first version of the dataset that has been generated using longitudinal data.

Variable bireason “Main reason for moving to Germany”

The values for the variable bireason were changed due to incorrect integration of different variable versions. Previously, the version of the variable from the biographical questionnaire for samples A-L3, N was integrated into the version of the variable from the individual and biographical questionnaires for sample M1/2, which led to an inaccurate assignment of this variable. In addition, there are fewer cases with the variable bireason due to the removal of the variables "reasons for leaving country of origin" (biol: lr3136-lr3146) that were used. These variables had been integrated inaccurately into the bioimmig variable bireason.

Variable biscger “Attended school in Germany”

There has been a significant increase in cases due to the addition of further variables and the new migration questions from the biographical questionnaire. The variable “country of last school attendance” was only included as an indicator from wave bd (2013) onwards to generate the variable biscger. The corresponding long variable, lb0186_v1, also has values for the years 2001-2012.

Variable bicamp “Refugee residence Y, N”

Due to two newly added variables (lr3440, plm0679), there are significantly more cases with a value of "2 No".

Variables birelh[p|gp|c|sb|sh|dr|fr] “Family in Country of Origin”

Previously, in the generation process, it was not defined whether these variables should represent country of origin or country of residence or both. In the previous versions, the two were arbitrarily merged. The decision to include only the country of origin leads to a significant reduction of cases.

Variable birelhc2 "Underage children not in Germany”

Due to two newly added years (1997, 1999), there are significantly more cases with the value "2 No".

Variable biwfam "Already had family in country”

The variable biwfam was changed from a category (y/n) to a binary variable, since otherwise there would be a distortion of the content. The data generated previously were too imprecise, since the recoding to value “2 No” does not unambiguously exclude cases in which respondents have no family members in Germany.

For a closer look at the changes and variables used, see the bioimmig documentation.

3. HGEN

Variable hgtyp2hh in hgen

The variable no longer distinguishes the gender of single households, meaning that the old categories 11-16 were replaced with categories 11 (1-Person HH less than 35 years), 12 (1-Person HH 35 to less than 60 years), and 13 (1-Person HH greater than or equal to 60 years).

Datasets no longer distributed

Datasets BIOSOC, BIORESID, BIOAGE17

The datasets biosoc, bioage17, and bioresid will no longer be provided. Most of the information from biosoc and bioresid will be maintained in the biol dataset with different variable names. In jugendl, the variables from bioage17 are retained. In order to reduce the number of datasets and to avoid redundant information, we decided to include the variables from biosoc and bioresid in biol and bioage17 in jugendl. The generated data from biosoc, bioage17, and bioresid are reproduced in the best possible way in biol and jugendl by applying versioning and harmonization. Users who have used biosoc, bioage17, or bioresid should use this table to facilitate transition.

The datasets with the suffixes mig and refugees—for instance, bep_mig and bgp_refugees—are no longer available. This information from the migration and refugee samples is fully integrated into the associated “raw” and “long” files.

Survey Instruments

Individual (PAPI) 2019: Field-de Field-en
Individual (CAPI) 2019: Var-de
Household (PAPI) 2019: Field-de Field-en
Household (CAPI) 2019: Var-de
Biography (PAPI) 2019: Field-de Field-en
Biography (CAPI) 2019: Var-de
Catch-up Individual (PAPI) 2019: Field-de
Catch-up Individual (CAPI) 2019: Var-de
Youth (16-17-year-olds, PAPI) 2019: Field-de
Youth (16-17-year-olds, CAPI) 2019: Var-de
Early Youth (13-14-year-olds, PAPI) 2019: Field-de
Early Youth (13-14-year-olds, CAPI) 2019: Var-de
Pre-teen (11-12-year-olds, PAPI) 2019: Field-de
Pre-teen (11-12-year-olds, CAPI) 2019: Var-de
Mother and Child (Newborns, PAPI) 2019: Field-de
Mother and Child (Newborns, CAPI) 2019: Var-de
Mother and Child (2-3-year-olds, PAPI) 2019: Field-de
Mother and Child (2-3-year-olds, CAPI) 2019: Var-de
Mother and Child (5-6-year-olds, PAPI) 2019: Field-de
Mother and Child (5-6-year-olds, CAPI) 2019: Var-de
Parents and Child (7-8-year-olds, PAPI) 2019: Field-de
Parents and Child (7-8-year-olds, CAPI) 2019: Var-de
Mother and Child (9-10-year-olds, PAPI) 2019: Field-de
Mother and Child (9-10-year-olds, CAPI) 2019: Var-de
Deceased Individual (PAPI) 2019: Field-de
Deceased Individual (CAPI) 2019: Var-de

Please find all sample specific questionnaires of this year and all questionnaires of previous years on this site

Documentation of the datasets

1) SOEP-Core – 2019: Report of Survey Methodology and Fieldwork

2) Documentation of ISCED generation based on the CAMCES Tool in the IAB-SOEP Migration Samples M1/M2 and IAB-BAMF-SOEP Survey of Refugees M3/M4/M5 until 2019

3) SOEP-Core v36 – Documentation of Sample Sizes and Panel Attrition in the German Socio-Economic Panel (SOEP) (1984 until 2019)

4) SOEP-Core – 2019: Design, Nonresponse, and Weighting in the Sample Q (Queer)

5) SOEP-Core v36 – COGDJ

6) SOEP-Core – 2019: Sampling, Nonresponse, and Weighting in the Sample P

7) SOEP-Core v36 – PPATHL: Person-Related Meta-Dataset

8) SOEP-Core v36 – Biographical Information in the Meta File PPFAD (Month of Birth, Immigration Variables, Living in East or West Germany in 1989)

9) SOEP-Core v36 – HPATHL: Household-Related Meta-Dataset

10) SOEP-Core v36 – INTERVIEWER

11) SOEP-Core v36 – PBRUTTO: Person-Related Gross File

12) SOEP-Core v36 – HBRUTTO: Household-Related Gross File

13) SOEP-Core v36 – PGEN: Person-Related Status and Generated Variables

14) SOEP-Core v36 – HGEN: Household-Related Status and Generated Variables

15) SOEP-Core v36 – Codebook for the $PEQUIV File 1984-2019: CNEF Variables with Extended Income Information for the SOEP

16) SOEP-Core v36 – BIOIMMIG

17) SOEP-Core v36 – HEALTH

18) SOEP-Core v36 – BIOPAREN: Biography Information for the Parents of SOEP-Respondents

19) SOEP-Core v36 – BIOAGEL & BIOPUPIL: Generated Variables from the "Mother & Child", "Parent", "Pre-Teen", and "Early Youth" Questionnaires

20) SOEP-Core v36 – BIOSIB: Information on Siblings in the SOEP

21) SOEP-Core v36 – The Couple History Files BIOCOUPLM and BIOCOUPLY, and Marital History Files BIOMARSM and BIOMARSY

22) SOEP-Core v36 – PFLEGE: Documentation of Generated Person-level Long-term Care Variables

23) SOEP-Core v36 – LIFESPELL: Information on the Pre- and Post-Survey History of SOEP-Respondents

24) SOEP-Core v36: Codebook for the EU-SILC-like panel for Germany based on the SOEP

Documentation

1) Zur Erhebung des adaptiven Verhaltens von zwei- und dreijährigen Kindern im Sozio-oekonomischen Panel (SOEP)

2) Assessing the distributional impact of "imputed rent" and "non-cash employee income" in microdata : Case studies based on EU-SILC (2004) and SOEP (2002)

All documentation for filtering can be found on this page

Report on survey methods

1) SOEP-Core – 2019: Report of Survey Methodology and Fieldwork

SOEP-Core

SOEP-IS

Further Offer

Dataset Information

SOEP-Core - Reference Articles keyboard_arrow_up

SOEP-Core v36 - Further Editions keyboard_arrow_up

SOEP-Core v36 (2019) - Changes in the Dataset keyboard_arrow_up