SOEP-Core v30 - Changes in the Dataset

Änderungen am Datensatz

The new data distribution (1984–2013) “SOEP v30” provides, for the most recent survey year 2013, the usual wave-specific data files BDPBRUTTO, BDP, BDPKAL, BDPGEN, BDPAGE17, BDHBRUTTO, BDH, BDHGEN, BDKIND, and BCPLUECKE as well as the updated files with a longitudinal component (PFAD files, biography files, spell data, and weighting factors). Additional new samples, datasets, or variables are listed below:

1. Cross-Sectional Weights 2013

1. Cross-sectional weights 2013

We are pleased that with the figures now available from the official statistical agencies, we are now able to provide you the finalized weighting variables in this version of the data (doi:10.5684/soep.v30). As is always the case in years of refresher and enlargement samples, we are providing weights for the old and new samples, both separately and together. These different sets of weights are designed to make it easier for users to study how the integration of a new sample affects the analysis of specific research topics.

Please also note that the government census carried out in 2011 replaced the projected population figures, which had been regularly updated based on the last census in 1987, with current population of the Federal Statistical Office. This means that the post-stratification of SOEP weights from wave BD in data release v30 are based on a version of the Microcensus from 2013 that considers the 2011 census for the first time. It is therefore possible that changes in weighted analyses of the SOEP between 2012 (BC) and 2013 (BD) are the result of the government statistics switching over to the more recent census. The correction is evident in the fact that the estimated total number of individuals living in private households in Germany fell from 81 million in 2012 to less than 80 million in 2013.

Given the retrospective revision of the 2011 and 2012 Microcensus data to account for the census results, our next data release (soep.v31) will include retrospectively revised weighting variables for the 2011 and 2012 survey data.

If you have any comments on the weighting variables, we would be happy to hear from you

2. New IAB-SOEP Migration Sample (Sample M)

2. New IAB-SOEP Migration Sample (Sample M)

The new IAB-SOEP Migration Sample (Sample M) is a joint project with the Institute for Employment Research (IAB). It is therefore provided as part of the normal SOEP distribution (see, for example, variable psample in dataset ppfad), but also as a separate study including only Sample M households (10.5684/soep.iab-soep-mig.2013).

The new sample takes into account changes in the structure of migration to Germany since 1995. It covers not only direct immigration but also the “second generation,” the children of immigrants. The new sample opens up new perspectives for migration research and provides insights into the lives of new immigrants to Germany. The new sample has the following key features:

  1. The IAB-SOEP Migration Sample substantially increases the sample size for research on migration and the lives of immigrants in Germany: 4,964 persons residing in 2,723 households participated in the first wave of the survey. Moreover, since the survey is included in the regular SOEP as subsample “M”, including migrants from the other SOEP samples in analyses may increase the number of observations further.
  2. The questionnaire used with the new migration sample covers respondents’ entire migration biography. Migration episodes to other countries than Germany are covered as well. This is an important extension over previous SOEP surveys of immigrants’ personal biographies. For the first time, we can now track whether important events in individual biographies occurred in the respondent’s home country, in Germany, or in other destination countries. This also takes into account that migration is no longer a one-time event that lasts for a lifetime but that individual biographies are becoming increasingly “transnational,” often with several migration episodes taking place during an individual’s lifetime and involving personal ties in different countries. We created a user-friendly spell data set, called MIGSPELL, for the use of this data.
  3. Following recent advances in the research on migration and immigration, the IAB-SOEP Migration Sample considers numerous new sets of questions that were not previously considered in the SOEP or other household surveys in Germany, at least not in the necessary depth. Examples of such question blocs are: earnings and the labor force and occupational status before migration; migration decisions in the family and partnership context; and the purposes and channels of transferring remittances.

3. New datasets / variables


3. New datasets / variables

For the comprehensively surveyed migration biography, we have created a user-friendly spell data set. Detailed documentation will be available in the biographical data documentation of the SOEP.

3.2. BDP_MIG

The original data from the Sample M specific survey instrument is included in the dataset BDPMIG, combining the individual and the biographical questionnaire. The variables are also included in the other standard or generated datasets:
  • Variables equivalent to variables in the individual questionnaire of other samples are included in the dataset BDP
  • Variables equivalent to variables in the biography questionnaire of other samples are included in the respective biography dataset (e.g. BIOMARSM)
  • The comprehensively surveyed migration biography can be found in the new dataset MIGSPELL.

3.3. JOBEND$$

Since a number of changes occurred in the categories for reasons for job dismissal, a new longitudinally consistent variable (JOBEND$$) is now offered in the $PGEN data sets./p>

3.4. New additional occupations codes

The data on occupations in the individual questionnaire are now additionally coded using KldB2010 and partly also ISCO-08. The following variables are included in the dataset BDP:


Variable Label


Current Occupational Classification (KldB2010)


Current Occupational Classification (ISCO-08)


Current Occupational Classification Secondary Employment (KldB2010)


Current Occupational Classification Secondary Employment (ISCO-08)


Vocational Training / Education Degree Prev. Yr. (KldB2010)

However, variables of derived scales (e.g. prestige scores in $$PGEN) are still based on ISCO-88.

3.5. Grip strength data for 2012

GRIPSTR update: The data on grip strength from the survey year 2012 is now included in the GRIPSTR dataset.

3.6. Wealth data for 2012

PWEALTH and HWEALTH updated: In the year 2012, all individuals aged 17 and over were again surveyed on wealth, just as they were in 2002 and 2007. These “raw” data were already part of the standard data distribution for Wave 29 and will be included in the upcoming data distribution in a file containing the data for 2002, 2007, and 2012 in “long format”—the file PWEALTH for individual data, HWEALTH with data aggregated according to household context. Values that are missing due to item or partial unit non-response (e.g., missing interviews with individual household members in interviewed households) will be subjected to multiple imputations in complex procedures taking longitudinal information into account.

3.7. BIOEDU now part of the regular data distribution

After it became impossible to update the beta version of this data set in version 29, the data have now been updated and incorporated into the regular data distribution. The information from the new IAB-SOEP Migration Sample was also integrated.

3.8. INTERVIEWER dataset

The dataset comprises demographic and employment information about interviewers, aggregated data on the interviewers’ fieldwork in each wave, as well as personal details that they provided in the two interviewer surveys of 2006 and 2012. In the process of creating the INTERVIEWER dataset, all interviewer indicators (INTID) in all of the SOEP datasets were checked thoroughly and in some cases revised.

4. Revisions and Bug fixes

4.1. Corrections in BILZTCH$$ and BILZTEV$$

The variables BILZTCH$$ and BILZTEV$$ lacked information on a number of waves up to now. As a result, false values were ascribed to variables in a number of cases: a total of 638 previously consistent cases proved to be inconsistent increases in educational levels and 2,582 previously inconsistent cases proved consistent.

4.2. Corrections in DUEBSTD

In addition to the generation of overtime work for 1984 and 1985 overtime work has now been generated for 1987 as well. For these years, overtime hours result from the difference between contractually agreed working hours and the number of hours actually worked per week.

4.3. Revisions of marital and relationship status

$FAMSTD: As a result of a new process for generating BIOMARSM/Y and BIOCUPLM/Y, two changes occurred in $FAMSTD: Since 2010 the question on marital status has included the categories “registered same-sex partnership, living together” and “registered same-sex partnership, not living together”. These two categories are also included in $FAMSTD as values “7” and “8”. Furthermore all spells of BIOMARSM/Y in the category “widowed or divorced” have been set to “not valid” in $FAMSTD. These changes were also applied to previous waves. The variable $FAMSTD is set to -3 if information is implausible, to -5 if persons were not interviewed, and to -1 if persons did not answer the question.

BIOCOUPLM/Y: For the process of generating BIOCOUPLM, the current relationship status and reported changes in the family situation are taken into account. Although the questionnaire asks for such events on a monthly basis, numerous changes in the relationship status are not reported as events. So in the new version of BIOCOUPLM, we have included a censor variable called “events” which gives you information on whether the exact month of an event is known or whether the begin or end of a spell reflects the month of the interview due to the lack of reported events. Finally a new category “added spell” has been introduced into the variable remark, which lets you distinguish between spells that have been edited (value 2) and spells that have been added (value 3). For further information, please see the new documentation on BIOMARSM/Y. The variable SPELLTYP is set to -3 if information is implausible.

BIOMARSM/Y: Because BIOMARSM is derived from the new version of BIOCOUPLM, we have copied the category “married, separated” from BIOCOUPLM. It reflects the time between a reported separation and divorce or the death of the spouse. Most of these spells of BIOCOUPLM were set to “married” in BIOMARSM, but for those spells without a reported end, event spells were set to “married, separated” and the end of the spells to missing. Parallel spells from the category “divorced or widowed” were added, whereas the outset of those spells was set to missing. Finally a new category “added spell” has been introduced into the variable remark, which let you distinguish between spells that have been edited (value 2) and spells that have been added (value 3). For further information, please see the new documentation on BIOCOUPLM/Y. The variable SPELLTYP is set to -3 if information is implausible.

4.4 $regtyp: conversion to urban / rural area

The new typology of German BBSR describes the settlement structure allowing for categorization into four types of regions. But the use of these four categories would, on the other hand, allow for the identification of specific administrative districts (Landkreise) in the counties of Saxonia, Mecklenburg-Western Pomerania, and Baden-Württemberg. Therefore, we must use a condensed two-category classification: urban and rural areas.