The new data distribution (1984–2013) “SOEP v30” provides, for the most recent survey year 2013, the usual wave-specific data files BDPBRUTTO, BDP, BDPKAL, BDPGEN, BDPAGE17, BDHBRUTTO, BDH, BDHGEN, BDKIND, and BCPLUECKE as well as the updated files with a longitudinal component (PFAD files, biography files, spell data, and weighting factors). Additional new samples, datasets, or variables are listed below:
1. Cross-sectional weights 2013
We are pleased that with the figures now available from the official statistical agencies, we are now able to provide you the finalized weighting variables in this version of the data (doi:10.5684/soep.v30). As is always the case in years of refresher and enlargement samples, we are providing weights for the old and new samples, both separately and together. These different sets of weights are designed to make it easier for users to study how the integration of a new sample affects the analysis of specific research topics.
Please also note that the government census carried out in 2011 replaced the projected population figures, which had been regularly updated based on the last census in 1987, with current population of the Federal Statistical Office. This means that the post-stratification of SOEP weights from wave BD in data release v30 are based on a version of the Microcensus from 2013 that considers the 2011 census for the first time. It is therefore possible that changes in weighted analyses of the SOEP between 2012 (BC) and 2013 (BD) are the result of the government statistics switching over to the more recent census. The correction is evident in the fact that the estimated total number of individuals living in private households in Germany fell from 81 million in 2012 to less than 80 million in 2013.
Given the retrospective revision of the 2011 and 2012 Microcensus data to account for the census results, our next data release (soep.v31) will include retrospectively revised weighting variables for the 2011 and 2012 survey data.
If you have any comments on the weighting variables, we would be happy to hear from you (mkroh@diw.de).
2. New IAB-SOEP Migration Sample (Sample M)
The new IAB-SOEP Migration Sample (Sample M) is a joint project with the Institute for Employment Research (IAB). It is therefore provided as part of the normal SOEP distribution (see, for example, variable psample in dataset ppfad), but also as a separate study including only Sample M households (10.5684/soep.iab-soep-mig.2013).
The new sample takes into account changes in the structure of migration to Germany since 1995. It covers not only direct immigration but also the “second generation,” the children of immigrants. The new sample opens up new perspectives for migration research and provides insights into the lives of new immigrants to Germany. The new sample has the following key features:
3. New datasets / variables
3.1. MIGSPELL
3.2. BDP_MIG
The original data from the Sample M specific survey instrument is included in the dataset BDPMIG, combining the individual and the biographical questionnaire. The variables are also included in the other standard or generated datasets:3.3. JOBEND$$
Since a number of changes occurred in the categories for reasons for job dismissal, a new longitudinally consistent variable (JOBEND$$) is now offered in the $PGEN data sets./p>
3.4. New additional occupations codes
The data on occupations in the individual questionnaire are now additionally coded using KldB2010 and partly also ISCO-08. The following variables are included in the dataset BDP:
Varname |
Variable Label |
bdp38_kldb2010 |
Current Occupational Classification (KldB2010) |
bdp38_isco08 |
Current Occupational Classification (ISCO-08) |
bdp81_kldb2010 |
Current Occupational Classification Secondary Employment (KldB2010) |
bdp81_isco08 |
Current Occupational Classification Secondary Employment (ISCO-08) |
bdp9005_trainkldb2010 |
Vocational Training / Education Degree Prev. Yr. (KldB2010) |
However, variables of derived scales (e.g. prestige scores in $$PGEN) are still based on ISCO-88.
3.5. Grip strength data for 2012
GRIPSTR update: The data on grip strength from the survey year 2012 is now included in the GRIPSTR dataset.
3.6. Wealth data for 2012
PWEALTH and HWEALTH updated: In the year 2012, all individuals aged 17 and over were again surveyed on wealth, just as they were in 2002 and 2007. These “raw” data were already part of the standard data distribution for Wave 29 and will be included in the upcoming data distribution in a file containing the data for 2002, 2007, and 2012 in “long format”—the file PWEALTH for individual data, HWEALTH with data aggregated according to household context. Values that are missing due to item or partial unit non-response (e.g., missing interviews with individual household members in interviewed households) will be subjected to multiple imputations in complex procedures taking longitudinal information into account.
3.7. BIOEDU now part of the regular data distribution
After it became impossible to update the beta version of this data set in version 29, the data have now been updated and incorporated into the regular data distribution. The information from the new IAB-SOEP Migration Sample was also integrated.
3.8. INTERVIEWER dataset
The dataset comprises demographic and employment information about interviewers, aggregated data on the interviewers’ fieldwork in each wave, as well as personal details that they provided in the two interviewer surveys of 2006 and 2012. In the process of creating the INTERVIEWER dataset, all interviewer indicators (INTID) in all of the SOEP datasets were checked thoroughly and in some cases revised.
4. Revisions and Bug fixes
4.1. Corrections in BILZTCH$$ and BILZTEV$$
The variables BILZTCH$$ and BILZTEV$$ lacked information on a number of waves up to now. As a result, false values were ascribed to variables in a number of cases: a total of 638 previously consistent cases proved to be inconsistent increases in educational levels and 2,582 previously inconsistent cases proved consistent.4.2. Corrections in DUEBSTD
In addition to the generation of overtime work for 1984 and 1985 overtime work has now been generated for 1987 as well. For these years, overtime hours result from the difference between contractually agreed working hours and the number of hours actually worked per week.
4.3. Revisions of marital and relationship status
$FAMSTD: As a result of a new process for generating BIOMARSM/Y and BIOCUPLM/Y, two changes occurred in $FAMSTD: Since 2010 the question on marital status has included the categories “registered same-sex partnership, living together” and “registered same-sex partnership, not living together”. These two categories are also included in $FAMSTD as values “7” and “8”. Furthermore all spells of BIOMARSM/Y in the category “widowed or divorced” have been set to “not valid” in $FAMSTD. These changes were also applied to previous waves. The variable $FAMSTD is set to -3 if information is implausible, to -5 if persons were not interviewed, and to -1 if persons did not answer the question.
BIOMARSM/Y: Because BIOMARSM is derived from the new version of BIOCOUPLM, we have copied the category “married, separated” from BIOCOUPLM. It reflects the time between a reported separation and divorce or the death of the spouse. Most of these spells of BIOCOUPLM were set to “married” in BIOMARSM, but for those spells without a reported end, event spells were set to “married, separated” and the end of the spells to missing. Parallel spells from the category “divorced or widowed” were added, whereas the outset of those spells was set to missing. Finally a new category “added spell” has been introduced into the variable remark, which let you distinguish between spells that have been edited (value 2) and spells that have been added (value 3). For further information, please see the new documentation on BIOCOUPLM/Y. The variable SPELLTYP is set to -3 if information is implausible.
4.4 $regtyp: conversion to urban / rural area
The new typology of German BBSR describes the settlement structure allowing for categorization into four types of regions. But the use of these four categories would, on the other hand, allow for the identification of specific administrative districts (Landkreise) in the counties of Saxonia, Mecklenburg-Western Pomerania, and Baden-Württemberg. Therefore, we must use a condensed two-category classification: urban and rural areas.