Skip to content!

SOEP-Core v35 (2018) - Changes in the Dataset

Änderungen am Datensatz

Dataset Information

SOEP-Core soep-core.v35

1. New sample in the main SOEP study

The new refresher sample, Subsample O, contains 1,000 new households. These were selected in cooperation with BBSR using a new sampling design based on regional data in areas where the “Soziale Stadt” (social city) urban development project is being carried out. Based on the digital data available on the boundaries of the “Soziale Stadt” areas, it was possible to create a new variable going back to the year 2000 that shows whether or not a household’s address is within an area covered by the project (see Variable Description below under 4.4).

2. Modifications in our new main data format, SOEPlong

We have made the following important changes over and beyond to our normal annual updates:

  • PKAL: Integration of the $PKALOST datasets
  • PL/PKAL: Calendar strings all now stored in PL and monthly variables in PKAL
  • PLUECKEL: Introduction of RYEAR and correction of SYEAR, which was RYEAR up to now
  • PBRUTTO: If a variable was not part of the year-specific gross file, the missing code has been changed to -8 and is no longer -2.
  • VPL: The case numbers for past years have increased since cases without a SOEP respondent are no longer deleted
  • KIDLONG: The harmonization concept has been adapted to the concept used with other datasets; more variables from $KIND datasets have been included (more information under 4)

3. New in SOEPhelp

  • SOEPhelp now includes links between topics and variables from the metadata. The data overview (command: soephelp (without variable)) lists all the topics in the dataset and tells which variables belong to which topic.
  • The variable overview (command: soephelp [variable]) lists the topics covered by the variable (and the relationships among topics and subtopics). The topic labels are linked to
  • SOEPhelp now has a search tool! If you type in the command: soephelp, search (SEARCH TERM) [verbose], you will get a list of the variables for which your SEARCH TERM is contained either in the question or one of the answer options. The variables are provided in list form and saved in r (for returns). The option “verbose” describes the variables in more detail.
  • More information on SOEPhelp

4. New Datasets and Variables

4.1. Early Childhood

  • New dataset BCBFK “Early Childhood" with geographically detailed information about the places where the respondents grew up. Because of the detailed regional data the dataset is only available with the RDC SOEP. The corresponding field report and questionnaire is available as SOEP Survey Paper 766 (in German).

4.2. Your Life in the GDR

  • New dataset DDR18 “Your Life in the GDR”, the corresponding questionnaire is available as Survey Paper 676.

4.3. Biography follow-up survey

  • The variables from the biography follow-up survey on migration status have been integrated into the dataset BILELA or BIOL.

4.4. New variable SOCURBAN in dataset HBRUTTO

  • SOCURBAN: Household address is in an area where the “Soziale Stadt” (social city) urban development project is being carried out (as of July 2017) (Yes/No)

4.5. New variables in dataset EQUIV

  • ILIB1$$: pensions for liberal professions
  • ILIB2$$: widow / orphans pensions for liberal professions

4.6. New variables in dataset BIOJOB

  • In 2018, respondents received new survey instruments concerning job classifications and prestige score. This information is provided in new following variables: STBA10, ISCO08, EGP08, ISEI08, MPS08, and SIOPS08. Corresponding variables STBA, EGP, ISEI, MPS, und SIOPS of older versions of BIOJOB are renamed in STBA92, EGP88, ISEI88, MPS92, und SIOPS88.

5. Changes to datasets or individual variables

5.1. Weighting variable PHRF in the dataset PPATHL

  • There are slight changes concerning the poststratification of the weighting variables starting in 2013. The changes relate to the year of immigration. Previously, respondents who immigrated before 1955 were treated as migrants; they now constitute a distinct category of their own, along with recent immigrants and German-born respondents. The reason is that it is not possible to define ethnic Germans consistently between the Mikrozensus and the SOEP.

5.2. Variables representing occupational codes

  • Since 2013, open-ended questions on occupations have been coded in ISCO-08 and KldB 2010. This is the first year in which the old classifications ISCO-88 and KldB 92 are no longer available. We have therefore introduced new prestige scores based on the new classifications and discontinued the old scores.
  • Calendar strings have been moved from $PKAL to $P or standardized.

5.3. Educational variables

  • Up to soep.v34, the basic generated educational variables were generated annually and were cumulated over time. Due to the availability of SOEPlong, we have substantially revised the tools used for generating variables to always consider all available educational variables for each year.
  • In addition to the fact that all variables are now generated based entirely on SOEPlong files, we have also made two additional modifications:
  • First, the main educational variables now also take into account inconsistencies over time, in contrast to the educational variables in PGEN prior to soep.v34.
  • Second, variable “Amount of Education or Training in Years” ($$BILZEIT) has been slightly modified. To consider occupational training (for non-university degrees), we have adjusted the years of education for “civil servants” and “others” slightly.

5.4. Dataset KIDLONG

  • Errors in the integration of variables were corrected, split up in versioned variables, and harmonized variables were constructed. As a result, the number of variables has increased: 110 variables (v.34); 267 variables (v.35)
  • Missing variables from the $KIND datasets were incorporated into KIDLONG.
  • Corrected version of BHKIND was incorporated into KIDLONG.
  • KIDLONG now adheres to the classic harmonization concept).

5.5. Dataset BHKIND

  • Flag variable to identify child questionnaires that were not completed (BHKFLAG)
  • Missing observations were added: 15,032 (v.34) to 15,504 (v.35).
  • Errors in the integration of variables were corrected and missing variables were incorporated into BHKIND: 85 (v.34) variables; 129 variables (v.35).
  • All variables were renamed and now follow the SOEP naming conventions.

5.6. Dataset BIKIND

  • Flag variable added to identify child questionnaires that were not completed (BIKFLAG)
  • All variables now follow the SOEP naming conventions.

5.7. Variable PARID in the dataset PPATHL

  • Partnerships of respondents with net codes between 40 and 49 were dissolved and will be coded -2 “does not apply” in the future.

5.8. Variable HGOWNER in the dataset HGEN

  • In samples M3-M5 in 2017, several missing values in the variable HGOWNER were replaced with the information that a household is living in a shelter or housing for refugees.

5.9. Dataset INTERVIEWER

  • The year 2016 now contains information from Samples L2-M4.
  • The variable on the length of the interview (LENGTHINT) was eliminated and replaced by three variables, which each just give the average length of one questionnaire (LENGTHINT- H / P / J).
  • The youth surveys, which were previously counted in the number of interviews per person (AMOUNTINTP) now have their own variable (AMOUNTINTJ).

5.10. Dataset BIOAGE17

  • Previous versions of BIOAGE17 contained the identifier of the respondent’s mother (BYMNR) and father (BYVNR). The identifiers of the parents are found in BIOPAREN (MNR and VNR) and can be easily merged with BIOAGE17.
  • Desired occupation variables ISCO88 have been replaced by ISCO08. The same is true for BYKLAS: The old 1992 version has been replaced by the 2010 version.

5.11. Dataset BIOAGEL

  • The internal distinction between BIOAGE 8a and 8b, or between 81 and 82, has been eliminated, meaning that the dataset BIOAGEL now contains one line per child and respondent for questionnaires about 7-8-year-old children. As a result, when each parent completed a questionnaire on a child in a given year, there are two lines for that child (one line per parent). These can be identified by the different PIDE (PID of the respondent).