Changes in the Dataset

Data Updates

SOEP Quicklinks:  

SOEPinfo

SOEPlit

SOEPnewsletter

SOEPmonitor

SOEPdata Documents

SOEPdata FAQ


Data distribution v32 (Wave BF)

SOEP-Core soep.v32.1

  • BIOCOUPLY and BIOMARSY:  By mistake in the first version of the data delivery wrong data were uploaded for the two datasets. This version contains the correct datasets .
  • NACE in BFP and BFPGEN: A user reported implausible values for the variables BFP55_NACE and NACE15 containing information on the current job's industry. In this version  the information is updated after a bug in the script has been fixed.
  • Scale shift in BFP: In the v32 data release, the scales in BFP on the probability of specific events occurring in working life, which in previous years had been coded from 0-100 at 10-point intervals, were given on a scale from 0-10 for the CAPI and CAWI interviews. This inconsistency was corrected in the update adapting the scales to the previously used coding: scales from bfp4201, bfp4202, bfp4203, bfp7201, bfp7202, and bfp7203 were multiplied by 10 where bfpinta = 9 or 10; also, one case in  bfp7201 was changed from 4 to 40 where bfpinta = 8.
  • einstieg_artk and einstieg_pbio: SOEP has offered two additional labor market entry variables since providing data version 32 as part of the BIOJOB file. They were constructed on the basis of employment history information to the exact year and month. They refer to a generic uniform definition of the first survey period after the transition from the educational system to the labor market. The construction details for these variables are documented in detail in the SOEP Survey Paper 429, a short version of the description is also available in the BIOJOB documentation. (SOEP Survey Paper 418)

SOEP-Core soep.v32

The new data release (1984–2015) "SOEP.v32" provides, for the most recent survey year 2015, the usual wave-specific data files BFPBRUTTO, BFP, BFPEQUIV, BFP_MIG, BFPKAL, BFPGEN, BFPAGE17, BFHBRUTTO, BFH, BFHGEN, BFKIND, and BEPLUECKE as well as the updated files with a longitudinal component  (PFAD files, biography files, spell data, and weighting factors).

1. New migrant subsample (M2)

In 2013, we conducted the first IAB-SOEP Migration Sample in partnership with the Institute for Employment Research (IAB) in Nuremberg (for an overview of M1, see SOEP Survey Paper 216). The households from the second IAB-SOEP Migration Sample surveyed in 2015 are now also included in the SOEP data. The target population of the second IAB-SOEP Migration Sample consists of immigrants to Germany who have arrived between 2010 and 2013. Migrants from the new EU member states in Eastern Europe dominate this group. This focus will make it possible to better describe the dynamic recent evolution of immigration to Germany. The sample M2 consists of 1,096 households, and was, like sample M1, drawn from register data from the Federal Employment Agency.

Record Linkage

Please note that data from both samples can be linked with administrative employment and income data: Survey respondents are asked to provide explicit consent to record linkage. But since this linked dataset contains social data, these weakly anonymized data are only accessible on site at the Research Data Center of the German Federal Employment Agency at the IAB (FDZ IAB). Researchers can access FDZ IAB data through a guest visit to the IAB or through remote data processing, also arranged with the IAB. The linked data will soon be available to external researchers. Requests for data access should be directed to FDZ IAB, since a contract with IAB for data use is required.

For more information, see the FDZ IAB website.

2. Weighting

  • In version v32 of the SOEP data, the new migrant subsample, M2, has been integrated into the SOEP weighting framework. As is our usual practice when a new sample is integrated into the SOEP, we make different weighting factors available for the first wave. The standard weights (bfhhrf/bfphrf) allow researchers to draw inferences about the underlying population of residents in Germany based on all SOEP samples. The variables bfhhrfam1/bfphrfam1 allow the same inferences, but only using data from the old Samples A to M1. Comparisons between both sets of weights thus enable researchers to gauge the influence of the recent enlargement of the SOEP for population estimates. Weights specific to the recent enlargement M2, bfhhrfm2/bphhrfm2, allow researchers to draw inferences about the target population of immigrants to Germany between 2010 and 2013.
  • The adjustment of weights to census margins on the individual level has been updated since 1984 so that now the number of women and men in each age group (five-year categories) is given as the margin. Up to now, two separate margins were used for sex and age group.
  • Upon request, we now provide weighting factors for survey years 2010 to 2013 (waves BA to BD) excluding Samples L1 to L3. Due to differences in survey instruments used with Samples L1 to L3 in the corresponding waves as part of the "Familien in Deutschland" (Families in Germany) survey,  a need for weighting may arise when variables are to be analyzed that were not surveyed in the other samples.

3. Changed datasets or variables

  • MIGSPELL: With the integration of the data from 2013 (BD) to 2015 (BF), larger changes in the number and coding of the MIGSPELL variables were necessary, since in particular the status upon entry to Germany was surveyed in the individual waves with differing degrees of specificity. In addition, an improved procedure was introduced for imputation of missing data. A detailed description of the new version of MIGSPELL can be found in the SOEP 2015 documentation on Biography and Life History Data (coming soon).
  • Variables connected to occupations:
    - The variables names have changed and should now be more informative; the name of the coding scheme is now part of the variable name, e.g., isco88.
    - The occupational codes (KldB92, ISCO-88) now comply better with official standards (e.g., variables with suffixes _kldb92 or _isco88 in $P files).
    - In $PGEN there are now also variables using the coding schemes for KldB2010 and ISCO-08.
    - The code for generating the derived prestige scales has been redesigned, e.g., egp88_12 for egp class based on ISCO-88 in the year 2012.
  • BIOIMMIG:  The variable biwfam ("Already Had Family In Country") was recoded incorrectly in the generated dataset for the migration samples in 2013 and 2014. This was corrected in the current data release.
  • Survey Year: With Version 32, variables referring to the survey year are referred to consistently as syear. Previously there were a few variables with names like erhebj and svyyear.

4. New datasets or variables

  • BIOIMMIG: Additional variable for the main reason for migrating to Germany (only available since 2014).
  • PFLEGE: A new variable, appraisal with the label: “officially assessed as in need of care”
  • $PEQUIV: six new variables:
        -  ichsu$$ Child support, caregiver alimony
        -  fchsu$$ Imputation flag child support, caregiver alimony
        -  ispou$$ Divorce alimony
        -  fspou$$ Imputation flag Divorce alimony
        -  irie1$$ Riester pension plan
        -  irie2$$ Riester widow pension plan

 

  • PPFAD: Person-related meta dataset
    -  Some immigration variables (GERMBORN, CORIGIN and IMMIYEAR) previously contained a -3 for all respondents in Sample G who were not asked to state their country of birth and year of immigration. Since respondents from other samples (e.g. A) were also not directly asked to provide this information and were coded -2, the coding of missing values was not consistent across samples. This inconsistency was corrected in the new update (v32).
    -   Respondents who immigrated in the year 1949 (when the Federal Republic of Germany was founded) were previously considered not to have been born in Germany due to a coding error. This has been fixed in the updated version, and now, in accordance with the German Microcensus, all persons who immigrated before 1950 (after 1949) are considered to have been born in Germany. This also led to a change in the value label of IMMIYEAR.
    -   More information was considered in the updated version of MIGINFO, leading to changes in the values.

Data distribution v31 (Wave BE)

1. Integration of the FiD study (data from 2010 ongoing)

We are pleased to announce that the data release v31 will include the data from “Familien in Deutschland” (Families in Germany, FiD) which is being retrospectively integrated into the SOEP and made available in user-friendly form to all SOEP users. The survey has been carried out in parallel to the SOEP as a so-called “SOEP-related study” from 2010 to 2013.

The original SOEP-related study FiD

The idea of FiD was to evaluate the full range of public benefits in Germany for married people and families on behalf of the Federal Ministry for Family Affairs. The datasets available—including the SOEP—were not sufficient for differentiated analysis of the segments of the population targeted by family policies. Particularly problematic were the very small percentages of single parents, families with more than two children, low-income families, and families with very young children in the German population. These groups are of course included in the SOEP, but the number of observations is too small for sound statistical analysis.

Since 2010, the SOEP Research Infrastructure at DIW Berlin has been working in collaboration with TNS Infratest Sozialforschung to survey more than 4,500 households every year. The FiD sample consists of the following subsamples:

  • A sample of families in “critical income brackets”
  • A sample of single parents
  • A sample of families with more than two children
  • “Cohort samples” of the 2007, 2008, 2009, and 2010 (first quarter) birth cohorts.

A description of the original FiD study can be found in article “Familien in Deutschland – FiD” by Mathis Schröder, Rainer Siegers, and C. Katharina Spieß, Schmollers Jahrbuch 133 (4), 2013, 595-606. (http://dx.doi.org/10.3790/schm.133.4.595). (Pre-published 2013: SOEPpapers 556 | PDF, 160.15 KB . Berlin: DIW Berlin).

Integration into SOEP-Core

Starting with Version 31 of the data, the FiD sample will be integrated completely into the SOEP-Core data—that is, as if it were a new sample drawn as part of SOEP-Core in 2010 and 2011. The integration of the FiD sample will result in a significant increase by almost one-third in the number of cases in SOEP-Core since 2010. The figure shows how the new FID samples L1 to L3 have affected cross-sectional sample size since 2010. The retrospective integration meant that the sample variables had to be adjusted as other subsamples have been added to SOEP-Core since 2010 (see adjustment of the sample variables).

Stichprobenentwicklung

In total, 14,166 variables from 64 datasets have been integrated into the various SOEP datasets, and the generated data sets or variables have been adjusted. Variables in the FiD survey instruments that were not contained in the corresponding SOEP survey instruments have been included in the respective datasets as additional variables (with the original FiD variable names starting with “fyy”, where “yy” is a two-digit year identifier). The table below gives an overview of the number of variables in each of the two main questionnaires that could be integrated.

Year Individual questionnaire (–p)
Number of variables integrated
Household questionnaire (–h)
Number of variables integrated
2010 314 274
2011 472 172
2012 350 188
2013 363 169

This means that from 2010 on, SOEP users have more cases in their study population—automatically, as it were—without having to make any changes in scripts. Of course, it may be that certain variables were not collected in FiD and are therefore unavailable for these cases. Here, please refer to our conventional approach to missings, which makes this easy to see on the variable level:

Code Meaning
-1 no answer / don’t know
-2 does not apply
-3 implausible value
-4 Inadmissible multiple response
-5 Not included in this version of the questionnaire
-6 Version of questionnaire with modified filtering
-8 Question not part of the survey program this year*

*Only applicable for datasets in long format.

2. Cross-sectional weights 2014

The Federal Statistical Office plans to adjust the already-released Microcensus data from 2011 and 2012 based on the 2011 census data. This means that in the present SOEP data release (v31), the weights for waves BB and BC will change due to the adjustment to the 2011 census data.

Because v31 will include the data from the SOEP-related study FiD, the integration of these households into the SOEP will increase the overall case number by around one-third and it will also affect the integrated weighting variables. This is due to the additional households as well as to the differentiated consideration of official information on family types in the weighting process. To allow users to test how a new sample may affect their research using the SOEP data, we provide both integrated weights and also separate weights for the old and new samples in the year when a refresher sample was integrated into the SOEP.

3. Other changes

3.1 Adjustment of the psample / hsample variables

Due to the retrospective integration of the FiD sample, the psample variable in ppfad and the corresponding hsample variable hpfad had to be adjusted.

sample variables

Value Old Labels (v30) New Label (v31)
1 A German West A Original Sample (DE-West)
2 B Foreigner West B Migration (up to 1983, DE-West)
3 C German East C Original Sample (DE-East)
4 D 84-93 Immigrant (West) D 1994/5 Migration (1984-92/94 DE-West)
5 E Refreshment 1998 E 1998 Refreshment
6 F ISOEP 2000 F 2000 Refreshment
7 G High-Income Test 2002 G 2002 High-Income
8 H Refreshment 2006 H 2006 Refreshment
9 I Incentives 2009 I 2009 Incentivization
10 J Refreshment 2011 J 2011 Refreshment
11 K Refreshment 2012 K 2012 Refreshment
12 L1 2010 Birth Cohorts (2007-2009)
13 M Migration 2013 L2 2010 Family Types
14 L3 2011 Family Types
15 M1 2013 Migration (1995-2010)




3.2. Biographical data sets

The following datasets with biographical information were pooled to keep the number of life-courserelated datasets to a reasonable level:

biobirth and biobirthm -> biobirth
Women’s (biobirth) and men’s (biobirthm) childbirth biographies are merged into the dataset biobirth as of v31, of course along with a gender variable.

bioage01 to bioage12 -> bioagel
Starting with data distribution v31, the age-specific data from the mother/parent-child questionnaires are provided only in the user-friendly “long” format: Rather than as age-specific individual files (e.g., bioage01, bioage03, ...), all mother-child and parent-child questionnaires are now pooled in the bioagel dataset. Consequently, all information on children can now easily be found in one dataset. The documentation on the biographical data includes a syntax to generate the age-specific individual files for those who do need them and information on how to use the new bioagel “long” data set most efficiently with SPSS and Stata.

The dataset bioage17 derived from the youth questionnaire is not included in this bioagel dataset.

3.3 Changes in $HGEN

The file HGEN v31.1 now contains the variable gas$$, which states the household’s gas costs starting in 2014. The variables $$eqplif and $$eqpnrj have now been carried forward from the last two years if a household did not provide a response in a given year.

3.4 Other changes in SOEP v31.1

The updates in v31.1 only affected the values of various variables. For detailed information please see doi soep.v31.1

Data distribution v30 (Wave BD)

The new data distribution (1984–2013) “SOEP v30” provides, for the most recent survey year 2013, the usual wave-specific data files BDPBRUTTO, BDP, BDPKAL, BDPGEN, BDPAGE17, BDHBRUTTO, BDH, BDHGEN, BDKIND, and BCPLUECKE as well as the updated files with a longitudinal component (PFAD files, biography files, spell data, and weighting factors). Additional new samples, datasets, or variables are listed below:

1. Cross-Sectional Weights 2013

1. Cross-sectional weights 2013

We are pleased that with the figures now available from the official statistical agencies, we are now able to provide you the finalized weighting variables in this version of the data (doi:10.5684/soep.v30). As is always the case in years of refresher and enlargement samples, we are providing weights for the old and new samples, both separately and together. These different sets of weights are designed to make it easier for users to study how the integration of a new sample affects the analysis of specific research topics.

Please also note that the government census carried out in 2011 replaced the projected population figures, which had been regularly updated based on the last census in 1987, with current population of the Federal Statistical Office. This means that the post-stratification of SOEP weights from wave BD in data release v30 are based on a version of the Microcensus from 2013 that considers the 2011 census for the first time. It is therefore possible that changes in weighted analyses of the SOEP between 2012 (BC) and 2013 (BD) are the result of the government statistics switching over to the more recent census. The correction is evident in the fact that the estimated total number of individuals living in private households in Germany fell from 81 million in 2012 to less than 80 million in 2013.

Given the retrospective revision of the 2011 and 2012 Microcensus data to account for the census results, our next data release (soep.v31) will include retrospectively revised weighting variables for the 2011 and 2012 survey data.

If you have any comments on the weighting variables, we would be happy to hear from you ().

2. New IAB-SOEP Migration Sample (Sample M)

2. New IAB-SOEP Migration Sample (Sample M)

The new IAB-SOEP Migration Sample (Sample M) is a joint project with the Institute for Employment Research (IAB). It is therefore provided as part of the normal SOEP distribution (see, for example, variable psample in dataset ppfad), but also as a separate study including only Sample M households (10.5684/soep.iab-soep-mig.2013).

The new sample takes into account changes in the structure of migration to Germany since 1995. It covers not only direct immigration but also the “second generation,” the children of immigrants. The new sample opens up new perspectives for migration research and provides insights into the lives of new immigrants to Germany. The new sample has the following key features:

  1. The IAB-SOEP Migration Sample substantially increases the sample size for research on migration and the lives of immigrants in Germany: 4,964 persons residing in 2,723 households participated in the first wave of the survey. Moreover, since the survey is included in the regular SOEP as subsample “M”, including migrants from the other SOEP samples in analyses may increase the number of observations further.
  2. The questionnaire used with the new migration sample covers respondents’ entire migration biography. Migration episodes to other countries than Germany are covered as well. This is an important extension over previous SOEP surveys of immigrants’ personal biographies. For the first time, we can now track whether important events in individual biographies occurred in the respondent’s home country, in Germany, or in other destination countries. This also takes into account that migration is no longer a one-time event that lasts for a lifetime but that individual biographies are becoming increasingly “transnational,” often with several migration episodes taking place during an individual’s lifetime and involving personal ties in different countries. We created a user-friendly spell data set, called MIGSPELL, for the use of this data.
  3. Following recent advances in the research on migration and immigration, the IAB-SOEP Migration Sample considers numerous new sets of questions that were not previously considered in the SOEP or other household surveys in Germany, at least not in the necessary depth. Examples of such question blocs are: earnings and the labor force and occupational status before migration; migration decisions in the family and partnership context; and the purposes and channels of transferring remittances.

3. New datasets / variables

3.1. MIGSPELL

3. New datasets / variables

For the comprehensively surveyed migration biography, we have created a user-friendly spell data set. Detailed documentation will be available in the biographical data documentation of the SOEP.

3.2. BDP_MIG

The original data from the Sample M specific survey instrument is included in the dataset BDPMIG, combining the individual and the biographical questionnaire. The variables are also included in the other standard or generated datasets:
  • Variables equivalent to variables in the individual questionnaire of other samples are included in the dataset BDP
  • Variables equivalent to variables in the biography questionnaire of other samples are included in the respective biography dataset (e.g. BIOMARSM)
  • The comprehensively surveyed migration biography can be found in the new dataset MIGSPELL.

3.3. JOBEND$$

Since a number of changes occurred in the categories for reasons for job dismissal, a new longitudinally consistent variable (JOBEND$$) is now offered in the $PGEN data sets./p>

3.4. New additional occupations codes

The data on occupations in the individual questionnaire are now additionally coded using KldB2010 and partly also ISCO-08. The following variables are included in the dataset BDP:

Varname

Variable Label

bdp38_kldb2010

Current Occupational Classification (KldB2010)

bdp38_isco08

Current Occupational Classification (ISCO-08)

bdp81_kldb2010

Current Occupational Classification Secondary Employment (KldB2010)

bdp81_isco08

Current Occupational Classification Secondary Employment (ISCO-08)

bdp9005_trainkldb2010

Vocational Training / Education Degree Prev. Yr. (KldB2010)

However, variables of derived scales (e.g. prestige scores in $$PGEN) are still based on ISCO-88.

3.5. Grip strength data for 2012

GRIPSTR update: The data on grip strength from the survey year 2012 is now included in the GRIPSTR dataset.

3.6. Wealth data for 2012

PWEALTH and HWEALTH updated: In the year 2012, all individuals aged 17 and over were again surveyed on wealth, just as they were in 2002 and 2007. These “raw” data were already part of the standard data distribution for Wave 29 and will be included in the upcoming data distribution in a file containing the data for 2002, 2007, and 2012 in “long format”—the file PWEALTH for individual data, HWEALTH with data aggregated according to household context. Values that are missing due to item or partial unit non-response (e.g., missing interviews with individual household members in interviewed households) will be subjected to multiple imputations in complex procedures taking longitudinal information into account.

3.7. BIOEDU now part of the regular data distribution

After it became impossible to update the beta version of this data set in version 29, the data have now been updated and incorporated into the regular data distribution. The information from the new IAB-SOEP Migration Sample was also integrated.

3.8. INTERVIEWER dataset

The dataset comprises demographic and employment information about interviewers, aggregated data on the interviewers’ fieldwork in each wave, as well as personal details that they provided in the two interviewer surveys of 2006 and 2012. In the process of creating the INTERVIEWER dataset, all interviewer indicators (INTID) in all of the SOEP datasets were checked thoroughly and in some cases revised.

4. Revisions and Bug fixes

4.1. Corrections in BILZTCH$$ and BILZTEV$$

The variables BILZTCH$$ and BILZTEV$$ lacked information on a number of waves up to now. As a result, false values were ascribed to variables in a number of cases: a total of 638 previously consistent cases proved to be inconsistent increases in educational levels and 2,582 previously inconsistent cases proved consistent.

4.2. Corrections in DUEBSTD

In addition to the generation of overtime work for 1984 and 1985 overtime work has now been generated for 1987 as well. For these years, overtime hours result from the difference between contractually agreed working hours and the number of hours actually worked per week.

4.3. Revisions of marital and relationship status

$FAMSTD: As a result of a new process for generating BIOMARSM/Y and BIOCUPLM/Y, two changes occurred in $FAMSTD: Since 2010 the question on marital status has included the categories “registered same-sex partnership, living together” and “registered same-sex partnership, not living together”. These two categories are also included in $FAMSTD as values “7” and “8”. Furthermore all spells of BIOMARSM/Y in the category “widowed or divorced” have been set to “not valid” in $FAMSTD. These changes were also applied to previous waves. The variable $FAMSTD is set to -3 if information is implausible, to -5 if persons were not interviewed, and to -1 if persons did not answer the question.

BIOCOUPLM/Y: For the process of generating BIOCOUPLM, the current relationship status and reported changes in the family situation are taken into account. Although the questionnaire asks for such events on a monthly basis, numerous changes in the relationship status are not reported as events. So in the new version of BIOCOUPLM, we have included a censor variable called “events” which gives you information on whether the exact month of an event is known or whether the begin or end of a spell reflects the month of the interview due to the lack of reported events. Finally a new category “added spell” has been introduced into the variable remark, which lets you distinguish between spells that have been edited (value 2) and spells that have been added (value 3). For further information, please see the new documentation on BIOMARSM/Y. The variable SPELLTYP is set to -3 if information is implausible.


BIOMARSM/Y: Because BIOMARSM is derived from the new version of BIOCOUPLM, we have copied the category “married, separated” from BIOCOUPLM. It reflects the time between a reported separation and divorce or the death of the spouse. Most of these spells of BIOCOUPLM were set to “married” in BIOMARSM, but for those spells without a reported end, event spells were set to “married, separated” and the end of the spells to missing. Parallel spells from the category “divorced or widowed” were added, whereas the outset of those spells was set to missing. Finally a new category “added spell” has been introduced into the variable remark, which let you distinguish between spells that have been edited (value 2) and spells that have been added (value 3). For further information, please see the new documentation on BIOCOUPLM/Y. The variable SPELLTYP is set to -3 if information is implausible.

4.4 $regtyp: conversion to urban / rural area

The new typology of German BBSR describes the settlement structure allowing for categorization into four types of regions. But the use of these four categories would, on the other hand, allow for the identification of specific administrative districts (Landkreise) in the counties of Saxonia, Mecklenburg-Western Pomerania, and Baden-Württemberg. Therefore, we must use a condensed two-category classification: urban and rural areas.

Data distribution v29 (Wave BC)

The new data distribution (1984-2012) "SOEP v29" provides, for the most recent survey year 2012, the usual wave-specific data files BCPBRUTTO, BCP, BCPKAL, BCPGEN, BCPAGE17, BCHBRUTTO, BCH, BCHGEN, BCKIND, and BBPLUECKE as well as the updated files with a longitudinal component (PFAD files, biography files, spell data, and weighting factors).

1. New subsample K

In 2012, we added a new refreshment sample with 1,526 new households (Sample K). In total, 12,322 households were interviewed as part of the 2012 fieldwork. As with previous general population samples, the refreshment sample K was  realized by using a multi-stage stratified sampling design. Refreshment sample K resulted in a very similar response rate of 34.7 % compared to our last Refreshment Sample J. Thus, the general downward trend in participation was successfully stopped through a range of measures including centralized face-to-face interviewer training, better pay for interviewers, and more attractive incentives for respondents.

In the current refreshment samples, fieldwork is conducted exclusively by CAPI, as it was with the previous refreshments H (2006), I (2009), and J (2011). Similarly to our other refreshment samples, data collection is focused on three main questionnaires: the household, the individual, and the youth questionnaire. Thus, no supplementary questionnaires were used with respondents in wave 1. The reason for focusing on the key questionnaires is to avoid "overburdening" respondents with a lengthy wave 1 interview.

2. Revision of the weighting and estimation procedure

In version SOEP v29 of the SOEP data, the data from subsamples J and K (first collected in 2011 and 2012, respectively) have been adjusted to the German Microcensus for the number of employed people in households of different sizes as well as for the number of private households receiving Unemployment Benefit II (ALG II). This correction prevents an overestimation of households receiving ALG II in the unweighted samples J and K.

Also, for all newly drawn samples since 1998, a minor adjustment has been made to the definition of households containing foreign nationals. The criterion is no longer the household head but the presence of at least one person of foreign nationality in the household. The revision was made due to a slightly increasing discrepancy between the reference person chosen in the German Microcensus and the household head in the SOEP.

3. New datasets / variables

  • In 2012, the SOEP replicated its wealth module for the fourth time after 1988, 2002, and 2007. Due to the higher response burden in first-wave respondents, we did not survey wealth in the most recent refreshment sample K (N=1506 households). For the estimation of totals, we therefore recommend to use the cross-sectional household and person weights covering "old" samples A through J only and excluding wave 1 units emanating from Sample K, i.e. BCHRFAJ and BCPHRFAJ.
  • COGNIT: For the short cognitive tests implemented in the survey year 2006 we can now provide the first repeat, including an additional word knowledge test. The name of the dataset changed from COGNIT06 to COGNIT, because both survey years are now included in long format. A detailed documentation of the first test can be found in Schupp et al. (2008) Erfassung kognitiver Leistungspotentiale Erwachsener im Sozio-oekonomischen Panel (SOEP), DIW Berlin, Data Documentation 32 | PDF, 447.63 KB .
  • Two new variables in $PGEN: The variable SNDJOB$$ represents the imputed current gross labor income from a second job, generated for all SOEP respondents who are employed in each respective wave. Information about gross income from the second job was first asked in 1995 (wave L). The respective imputation flag is the variable IMPSND$$.
  • For the first time, respondents were asked their place of birth. This information including the coordinates of the respective municipality is available at our guest workstations at the Research Data Center SOEP.
  • A new dataset HCONSUM with generated data from the consumption module used in the SOEP in the year 2010. A detailed documentation | PDF, 1.5 MB is available online.

 

4. Improvements and Bug Fixes

  • Revision of the $STELL codes (Relationship to the head of household) to differentiate between biological child and stepchild, adoptive child etc. :
Value  Wave BB (2011)             Wave BC (2012)                                        
   0  Head Of Household          Head Of Household                                     
   1  Spouse Of HH Head                                                                
   2  Life Partner                                                                     
   3  Son, Daughter                                                                    
   4  Foster Child                                                                     
   5  Son, Daughter-In-Law                                                             
   6  Father, Mother                                                                   
   7  Parent-In-Law                                                                    
   8  Brother, Sister,-In Law                                                          
   9  Grandchild                                                                       
  10  Other Relative                                                                   
  11  Non-Relative               Spouse Of HH Head                                     
  12  Child of HH-Heads Partner  Same-Sex Spouse                                       
  13  Same-Sex Spouse            Life Partner                                          
  21                              Son, Daughter                                         
  22                             Stepchild (Child of the Partner)                      
  23                             Adoptive Child                                        
  24                             Foster Child                                          
  25                             Grandchild                                            
  26                              Great-Grandchild                                      
  27                             Son, Daughter-In-Law                                  
  31                             Father, Mother                                        
  32                             Step Father / Step Mother / Spouse of Father or Mother
  33                             Adoptive Father or Mother                             
  34                              Foster Father or Mother                               
  35                             Parent-In-Law                                         
  36                             Grandparents                                          
  41                             Brother, Sister                                       
  42                             Half-Brother, Half-sister                             
  43                              Stepbrother, Stepsister                               
  44                             Adoptive Brother/Sister                               
  45                             Foster Brother/Sister                                 
  51                             Brother, Sister -in Law (spouse of brother/sister)    
  52                             Brother, Sister -in Law (brother/sister of spouse)    
  61                              Aunt, Uncle                                           
  62                             Niece/ Nephew                                         
  63                             Cousin/Cousine                                        
  64                             Other Relative                                        
  71                             Others                                                
  99  Unknown                     Unknown                                               

Please note that this also affects the corresponding variables in the dataset $KIND (and KIDLONG) and BIOPAREN.

  • We do no longer have additional variables on birth date (GEBJAHR and GEBMONAT) and sex (SEX) in our dataset KIDLONG, please use instead the more intensively checked versions in PPFAD.
  • Last year, we already provided the interviewer data with a new variable, INTID, which is unified across all waves and takes the place of the respective file-specific variables ($INTNR). The new variable was determined through one-time generation of a random number; it is therefore fixed and remains consistent in an integrated master file (not contained in the data distribution) for SOEPcore as well as for FiD (Families in Germany) and SOEP-IS (innovation panel). In addition to generating the INTIDs and updating the interviewer characteristics in INTVIEW, we have also made the following revisions:
    • The dataset INTVIEW no longer contains just the interviewer with interviewer characteristics but also all available interviewer numbers. To provide this information, we extracted all interviewer numbers from all available datasets. Flag variables in INTVIEW show whether other interviewer characteristics are available for this particular INTID or not.
    • A total of 181 INTIDs were newly assigned in the updated data, allowing these to be directly linked with the respective interviewer characteristics. This is due to the assignment of numbers by Infratest in East Germany from 1990 to 1995, when there were still some independent interviewers (IBB-numbers) for the East sample whose numbers were assigned according to a different system. These had to be harmonized with the interviewer numbers that were merged later.
  • BIOAGE03: the codes for personality was changed from 1-11 to 0-10 and is now consistent with the codes for personality in bioage06.
  • BIOAGE06: in 2008, for personality, the value zero was mistakenly coded -2. This mistake was corrected. This resulted in up to 65 additional valid cases for some traits in the survey year 2008.
  • $FAMSTD: In generating current marital status, current and previous year were switched for some cases in 2011 in v28.
  • In 2012, the questionnaire provides one-time-only information on the size of the local establishment in addition to the size of the entire company (BETR$$). The enriched questionnaire revealed that in previous interviews, some individuals mistakenly provided information on the local establishment size instead of the entire company size, especially if their entire company had 2000 or more employees. Due to the importance of longitudinal consistency, these persons were identified, and their 2012 original value of the entire company size BETR12 was replaced by their value of the local establishment size. These modifications also affected the variable ALLBET12. Please see the data documentation for further details.
  • The variable RUEBSTD ("overtime hours during last month" in 2001) had cases with incorrect non-response missings (-1), since respondents without overtime mistakenly were assigned to this category. In the corrected version, the value for these respondents is correctly coded as zero overtime hours.
  • With the variable vh4601 and the equivalent variables in the following years, the label "contributions over 2,500 euros" was used, but actually the questionnaire asked for "contributions over 500 euros". The label was corrected.
  • The variables ZERWZEIT and BAERWZEIT ("length of time with firm" in 2009 and 2010) had to be corrected for respondents in sample I who did not have their wave 2009 interview and wave 2010 interview in the respective year but at the beginning of the following year (2010 and 2011). Due to the longitudinal consistency check, these individuals mistakenly received an implausible value (-3) for BAERWZEIT. In the corrected version, the non-missing values of these respondents are considered to be valid and not set to missing.
  • LOC1989: In generating the data, persons are now included who never participated. As a result, the -2 means "does not apply, born before 1989" as planned for this variable. Respondents who have never participated and who were unable to gather information from other sources were set to -1 ("no answer").
  • The variables EXPFT$$, EXPPT$$, and EXPUE$$ (experience in full-time employment, part-time employment, and unemployment) have been improved. The variables reflect now the total length of full-time/part-time/unemployment in the respondent's career up to the point of the interview in a given year (instead of only up to December of the previous year). Since monthly employment activities are asked retrospectively in the following year, the variables cannot be updated for the most current wave.
  • The variable AHINC$$ in dataset $HGEN is no longer part of the data distribution, we recommend to use the completely (multiple) imputed monthly net household income from variables I$HINC$$ (or dataset MIHINC in long format over all years).
  • The variables ATATZEIT, AVEBZEIT, AUEBSTD and AERWZEIT were mixed up in the data distribution v28 and had to be corrected:
    • The correct values of ATATZEIT were found in the variable AERWZEIT.
    • The correct values of AVEBZEIT were found in the variable ATATZEIT.
    • The correct values of AUEBSTD were found in the variable AVEBZEIT.
    • The correct values of AERWZEIT were found in the variable AERWZEIT of the data distribution v27.

Data distribution v28 (Wave BB)

1. New additional missing codes

With the integration of sample J in 2011, conducting of the biographical questionnaire was moved from the second to the first wave and combined with the individual questionnaire in an integrated survey. This means that there are some slight differences in the survey instrument between the old samples A-H and the supplementary sample J.

The following additional missing codes have been introduced to the survey data to document these possible differences:

-4 "Inadmissible multiple response"
-5 "Not included in this version of the questionnaire"
-6 "Version of questionnaire with modified filtering"

2. Sample I now part of our new Innovation Sample

The SOEP Innovation Sample has been launched now and includes, inter alia, sample I. Sample I is therefore no longer part of the main survey as of 2011. See SOEP-IS on our website for further information about the Innovation Sample and the possibility of including your own questions.

3. New and renamed datasets

3.1 BIOCOUPLM
BIOCOUPLM provides spell data on partnership histories from the first to last personal interview of a respondent. Spells are measured on a monthly basis.

3.2 BIOCOUPLY
BIOCOUPLY provides spell data on partnership histories. It contains annual information on partnership status since the respondent’s year of birth, including available retrospective data and annually updated information.

3.3 BIOSIB (beta version)
The new file BIOSIB provides information on siblings living in the SOEP households. The dataset contains the person numbers of all siblings in an observed family. It includes information on their gender, their year of birth, and on the relationship between the observed siblings.
BIOSIB is included as a beta version in the current data release. Please do not hesitate to send both positive and negative feedback or suggestions to Daniel Schnitzlein ().

3.4 BIOEDU
The BIOEDU dataset contains details on educational transitions beginning with entry into childcare up to tertiary education in a consistently structured form.

3.5 BIOAGE long
In the new integrated bioage long dataset (BIOAGEL), data are presented in “long” format, i.e. this dataset will contain information from BIOAGE01, BIOAGE03, BIOAGE06, as well as BIOAGE08a and BIOAGE08b.

3.6 TRUST
Dataset on the Economic Behavior Experiment on Trust and Trustworthiness in the 2003, 2004, & 2005 SOEP Survey

This experiment to measure trust is based on the investment game introduced by Berg et al. (1995), a one-shot game for two players or movers who anonymously interact with each other. The first mover receives an endowment of 10 points and can transfer zero to ten points to the second mover. Every point that is transferred is doubled by the experimenters. The second mover is also given an endowment of ten points. After receiving points from the first mover, he/she decides on how much of the endowment to transfer back to the first mover (zero to ten points). As with the first mover's transfer, the back-transfer by the second mover is doubled by the experimenters. After the second mover's decision, the game ends and the subjects are paid their income in euros (one point equals one euro) by check sent a few days later.

A fundamental component of the game is that the participants actually receive money in accordance with the fixed payout function, i.e., all the decisions always have monetary consequences. This version of the game was developed by Fehr, Fischbacher, Schupp, von Rosenbladt & Wagner (2002).

The combination of representative survey and behavioral experiment was used in the SOEP main surveys in 2003, 2004, and 2005, with only minor modifications. Of the 1,432 original participants in 2003, 1,202 also took part in the experiment in 2004 and 2005.

The data are available in long format in the "TRUST" dataset. Consequently, this dataset contains information from each of the three waves in which the behavioral experiment was conducted.

3.6 TIMEPREF
Dataset on the Economic Behavior Experiment on Time Preferences in the 2006 SOEP Survey

In this experiment on economic behavior, respondents were asked to decide how they would like to receive €200 in prize money: if they would rather receive it immediately by check, or if they would prefer to wait and receive a larger amount later—that is, with interest. By splitting the sample (N = 1,503 persons) into random subsamples (splits), it was possible to vary both the time horizon and the implied interest rate to test possible incentive effects on the choice between a low payoff in the short term and a high payoff in the long term. The scientific director of the project was Prof. Dr. Armin Falk, CENs, University of Bonn.

4. New or revised variables

4.1 $HBRUTTO dataset

REGTYP$$:
The $HBRUTTO dataset will include a new variable to distinguish between urban, suburban and rural regions. This is based on the spatial categories of counties (as of December 31, 2009) used by the Federal Institute for Research on Building, Urban Affairs and Spatial Development (BBSR). The following spatial structure characteristics are used to define the categories:

  • Share of county’s population in large or medium-sized cities
  • Population density of the county
  • Population density of the county without taking large or medium-sized cities into consideration

Thus, three categories can be defined:

  1. Urban regions (Cities with at least 100,000 inhabitants and counties with at least 50% of the population living in large or medium-sized cities and with a population density of at least 150 inhabitants/km²; and counties with a population density not including large or medium-sized cities of at least 150 inhabitants/km²)
  2. Regions undergoing urbanization (Counties with at least 50% of the population living in large or medium-sized cities but a population density of below 150 inhabitants/km², and counties with less than 50% of the population living in large or medium-sized cities, and with a population density (excluding large or medium-sized cities) of at least 100 inhabitants/km²)
  3. Rural regions (counties with less than 50% of the population living in large or medium-sized cities and population density (excluding large or medium-sized cities) of below 100 inhabitants/km²).

 

4.2 $PGEN dataset

BILZTCH$$ / BILZTEV$$:
BILZTCH$$ indicates whether the respondents’ answers suggest a downward shift in years of education or training ($BILZEIT) since the last observation or an upward change since the last year which is inconsistent with additional information on education or training recently completed.
BILZTEV$$
is a flag variable which indicates whether the respondent showed some inconsistent change in $BILZEIT either upwards or downwards over the entire observation period.

$VEBZEIT and $UEBSTD

To be consistent with the FID dataset, the missing values of the variables $VEBZEIT and $UEBSTD were slightly recoded, as the missing value –2 is now assigned to self-employed individuals. In previous waves, self-employed persons had the missing value –3 (implausible answer).

For $UEBSTD, the value –3 (implausible answer) is assigned to all individuals with more than ten hours of weekly overtime AND who also had an agreed working time of over 80 weekly hours ($VEBZEIT is implausible, value –3) or actual weekly working time of more than 80 hours a week ($TATZEIT is implausible, value –3).

4.3 BIOPAREN dataset

Seven new variables have been added to BIOPAREN:
VAORT11 and MAORT11 indicate the mother and father’s current place of residence.
GESCHW, GESCHWUP, NUMS, NUMB and TWIN
provide information on siblings. The variable GESCHW indicates whether the respondent ever had any siblings at the time of the interview. GESCHWUP gives information about the year the sibling information was collected. NUMB and NUMS provides information on the number of brothers or sisters the respondent reports and TWIN indicates whether any of these are TWIN siblings (and of which type) of the respondent.

Data distribution v27 (Wave BA)

The release of the 1984-2010 SOEP data (waves A-BA) will contain the usual year-specific data files (BAP, BAH, BAPGEN, BAHGEN, BAPKAL, BAPBRUTTO, BAHBRUTTO, BAKIND and ZPLUECKE) and the updated files with a longitudinal component (PFAD files, biography files, spell data, and weighting factors). The respondents of Sample I (Incentives Sample) answered the biographical background questionnaire for the very first time in 2010.
Since minor changes have been made to many of the older datasets as well, we strongly recommend reinstalling all of the datasets from the new DVD.

1. New two-letter prefix (BA)

This SOEP data release (v27) will include, for the first time in the survey's 27 years, a two-letter rather than a single-letter wave prefix. Since we came to the end of the Latin alphabet with the letter Z in our last data release, we decided to use the wave prefix BA for the cross-sectional data format.

2. Updated beta version in "long format"

The SOEP data are now also available in "long format" as a beta version in addition to the usual data format. SOEPlong refers to a compressed form of the SOEP data. Rather than being provided as wave-specific individual files, all available years and cohorts are pooled (long format). The data are available on the second DVD. For details, see SOEPnewsletter No. 90/2010 | PDF, 3.53 MB .

3. Elimination of fakes

When the data for the second wave of our newest sample I were checked, 36 households were identified as faked interviews and will therefore no longer be included in this data release.

4. New and renamed datasets

4.1 BIOAGE08[A|B]

The BIOAGE08 dataset contains data from the new "parent questionnaire" which is given to the mothers and fathers of seven- to eight-year-old children. Thus, data are now available on the 2002/2003 birth cohorts that were first observed with the "newborn questionnaire." The new "parent questionnaire" is given to both mothers and fathers and thus provides two sets of responses on many of the children in the sample. Therefore, the file was split into two on the basis of the parent's gender and the household type the respondent is living in. BIOAGE08A includes only mothers and some fathers, where there was no information from the mothers available. BIOAGE08B includes fathers only. The documentation of this new dataset is included as a new chapter in our documentation on biography and life history data in SOEP (coming soon).

4.2 LIFESPELL

The LIFESPELL dataset contains data from the follow-up studies of SOEP dropouts (1992, 2001, 2006, and 2008), which were not previously included in the regular data release. The follow-up studies, based on information from public registers, serve to identify the current residence of former SOEP respondents, and thus allow studies of life expectancy and decisions to emigrate for a large percentage of SOEP respondents, even long after they have dropped out of the study. The documentation of this new dataset is included as a new chapter in our documentation on biography and life history data in SOEP (coming soon).

4.3 BIOEDU (beta version)

The BIOEDU dataset, which is being released this year in provisional form (beta release), contains details on educational transitions beginning with entrance into childcare up to tertiary education in consistently structured form. Users who work with these data are requested to report on their experiences (especially any problems they might have), so that a final version can be released next year.
Detailed documentation is in the DIW Data Documentation 58 | PDF, 383.03 KB .

Because of its provisional form, this dataset is not part of the normal distribution and you will find the data on the DVD in an extra archive (link available in the NEWS file on the DVD).

5. New Variables

5.1 $PEQUIV dataset

In the $PEQUIV-files, there will be a new additional variable on support payments. With the 2010 questionnaire, the SOEP has split the item "support payments" into two separate items. The first one now collects information on "alimony from legal spousal support, child support, and child care support" (ALIM$$), while the second item asks about "advance child maintenance payments" (IACHM$$). More information about the $PEQUIV files and the new variables is available in the DIW Data Documentation 57 | PDF, 0.54 MB .

5.2 PFLEGE dataset

The PFLEGE file now includes two new additional variables. "FURTHER" gives the number of further persons requiring help in the household. This question has been asked since 2009. "CARECOST" represents the regular monthly costs for care that a household normally spends. This question has been asked since 2010.

5.3 Dataset $PGEN

We are now providing detailed data on educational degrees and training qualifications prior to joining the panel: life course questionnaires have been distributed since 2001 to collect data on apprenticeship occupation, type of qualification (e.g., diploma), and the field of study for those who have obtained a degree. Up to now, data from these open-answer questions were not included in the data release. From now on, however, these data will be released in coded form. The classifications used for the data from the individual questionnaire have been slightly modified in the process of these revisions. A more detailed description is available within the PGEN documentation.
The new variables are

FIELD$$ Field of tertiary education
DEGREE$$ Type of tertiary degree
TRAINA$$ Apprenticeship-two-digit occupation KldB92
TRAINB$$ Vocational school-twodigit occupation KldB92
TRAINC$$ Higher vocational school-twodigit occupation KldB92
TRAIND$$ Civil servant training-twodigit occupation KldB92
FDT_F$$ Data source FIELD, DEGREE, TRAIN

6. Revised Variables

6.1 $P dataset

Name changes to the variables in the different classifications for occupation and sector in $P: the variables contained in the $P datasets are collected in alternate years from all respondents and from those individuals who changed occupations. Simultaneously, we also generate and distribute all the information on all years and all individuals in the $PGEN datasets. To more clearly distinguish the generated variables from the originally surveyed variables, and to establish a clear connection to the question number in the respective questionnaire, we have renamed the variables according to the following system:

 

Old variable name New variable name
$IS88 $pXX_IS88
$KLAS $pXX_KLAS
$BACE $pXX_NACE
$IS88 $pXX_IS88
$KLAS $pXX_KLAS

Thus, ZIS88, for example, is now ZP29_IS88. The corresponding variables of all datasets from the individual questionnaire ($P) have been renamed, but not the variables recommended for use from the generated datasets ($PGEN, e.g., IS8809 from ZP).

6.2 $HBRUTTO dataset

The coding of the variables identifying the federal state ($BULA, Bundesland) in which a household was included, is now consistent with the coding of the official statistics.

$bula (old codings) $bula (new codings)
0 Berlin  
1 Schleswig - Holstein 1 Schleswig-Holstein
2 Hamburg 2 Hamburg
3 Niedersachsen 3 Niedersachsen
4 Bremen 4 Bremen
5 Nordrhein-Westfalen 5 Nordrhein-Westfalen
6 Hessen 6 Hessen
7 Rheinl.-Pfalz, Saarl. 7 Rheinland-Pfalz
8 Baden-Wuerttemberg 8 Baden-Wuerttemberg
9 Bayern 9 Bayern
  10 Saarland
11 Berlin (Ost) 11 Berlin
12 Mecklenburg-Vorpommern  12 Brandenburg
13 Brandenburg  13 Mecklenburg-Vorpommern
14 Sachsen-Anhalt 14 Sachsen
15 Thueringen 15 Sachsen-Anhalt
16 Sachsen 16 Thueringen

 A differentiation between East and West Berlin can still be achieved by a combination with $SAMPREG (Sample Region in $PPFAD).

6.3. Minor bug fixes

  • In the BIOAGE17 file from data release v26, variables classifying the preferred job reported in the youth questionnaire (byklas, bymps, byisco88, byegp, byisei, bysiops) are deficiently coded and contain too many missing values. With the new data release v27, this bug has been fixed.
  • Some households in Berlin (only in 2006) were wrongly classified according to the variable of settlement pattern ($GTYP) in the dataset GGKBOU. This bug has also been fixed.

 

Data distribution v26 (Wave Z)

The 2010 data distribution (data for years 1984-2009) has comprehensive improvements, additions, and modifications. For the most recent survey year 2009, it also provides the usual wave-specific data ZPBRUTTO, ZP, ZPKAL, ZPGEN, ZPAGE17, ZHBRUTTO, ZH, ZHGEN, ZKIND and YPLUECKE as well as the updated files with a longitudinal component (PFAD files, biography files, spell data and weighting factors).

1. Beta verson in 'long format'

The SOEP data are being provided for the first time ever as a beta version in "long format" in addition to the standard data format. SOEPlong refers to a compressed form of the SOEP data: rather than being provided as wave-specific individual files, all available years and cohorts are pooled (long format).

2. New Subsample I
As part of the recent SOEP innovations, fieldwork began in fall 2009 on a new subsample (Sample I). The subsample is currently being used to test the effect of different incentive strategies on participation in the SOEP and it will become part of the innovation sample. See SOEPnewsletter 89 | PDF, 1.37 MB for more on this new sample.
In four randomly assigned groups, the following strategies were used:

  1. SOEP standard incentives (one lottery ticket per respondent),
  2. Choice of eather a lottery ticket or five euros per individual interview,
  3. Five euros per individual interview,
  4. Ten euros per individual interview.

The data from the new Sample I has been included in the new release of SOEP data (SOEP, v26), but due to the particular features of the subsample, it does not have an integrated weighting framework with the rest of the SOEP samples. For subsample I we are conducting a mail survey of all non-participants in the four groups. Since this is the first wave of subsample I, we were not able to integrate the biographical information from Sample I into the existing biography files. The same applies to the biographical information in the dataset PPFAD, e.g. the variable MIGBACK is completely set to -2.

Apart from that the following additions and modifications have been made:

3. New and Renamed Datasets 

  • Data on cognitive tests (COGDJ)
    For the first time, all available data on the cognitive tests of young people ("Denksport Jugend", DJ) are included in the SOEP data release. Since 2006, these tests have been given to young respondents (aged 16) the first time they participate in the SOEP survey.

4. New Variables

4.1 Dataset $HGEN
Two new variables describing the quality of the dwelling:

  • EQPLIF$$ "Dwelling has an elevator"
  • EQPNRJ$$ "Dwelling has alternative energy source"

4.2 Dataset $PEQUIV:

  • There is a new variable on additional child benefits together with the corresponding imputation flag variable (ADCHB$$ and FADCHB$$)

4.3 Dataset $HBRUTTO - Calender Year of Interview

  • We distribute now a variable( ZDATUMY) describing the calendar year of the interview for the first time. Because of the additional Sample I (and therefore a longer fieldwork period), there were some few cases with a successful interview in 2010.

5. Revised Variables

5.1 Datasets $HGEN  

  • The variables on household type TYP1HH$$ and TYP2HH$$ were completely revised and tested for intertemporal consistency.

5.2 Datasets $KIND – KIDLONG

  • The variables were also completely revised and are now also provided in longitudinal form (KIDLONG) as well as in cross-sectional form in $KIND. This made it necessary to change the variable names in KIDLONG to be consistent over time.

5.3 Datasets BIOMARSM/BIOMARSY  

  • The biographical data set on marital status was revised.

5.4 Dataset BIOTWIN

The dataset BIOTWIN contains 100 additional cases since wave Z. This considerable increase in case numbers is due to an adjustment in the data generation procedure: In contrast to the previous generation, all siblings with an identical year of birth are consdered twins under the condition that the information on the month of birth remains missing. This less restricitve generation is based on the assumption that two separate births in a single calendar year remain rare occurences. Nevertheless, the number of false positives in this group with a missing month of birth is likely to exceed the BIOTWIN average. Hence a new value label was introduced with the variable INFOTWIN in order to flag these twin groups for the user (Code "6": Coverage since 2007, congruent year of birth, missing month; see in contrast to this code "5": Coverage since 2007, congruent year & month of birth).
In its current state (wave Z) the dataset BIOTWIN covers 250 sets of twins and 5 sets of triplets.

infotwin: 
[1] Twins - Not in 2006 (gen.)
[2] Twins - 2006 (Answer Not Verifiable)
[3] Twins - 2006 (Answer Refused)
[4] Twins - 2006 (Answer Validated)
[5] Twins - since 2007 (gen.)
[6] Twins - since 2007 (gen.)

 

5.5 Minor bugs fixed

  • Correction of MONTH08
  • Correction of some very few cases on IMMIYEAR
  • Change in the variable names for questions 25 and 26 in YH and ZH

Data distribution 2008 (Wave Y)

The new dataset (Waves 1-25, 1984-2008) contains extensive improvements, additions, and modifications. Besides the usual wave-specific data YPRUTTO, YP, YPKAL, YPGEN, YHBRUTTO, YH, YHGEN, YKIND, and XPLUECKE as well as the updated files with a longitudinal component (PFAD files, biography files, spell data, and weighting factors).
We now also provide-in a beta-release-the data in a more user-friendly format called "SOEPlong". We announced this in SOEPnewsletter 80/2008 and thank all those who provided input on this issue. The new and preliminary version of the SOEP data in long format can be obtained upon request. We suggest that only "power users" should order this version of the data who would like to work with us to improve data management. This version contains all data and thus can essentially already be used for final analyses. This is a preliminary version. We do not recommend the new format for inexperienced users. If new SOEP users want to work with the new format, they should at least be familiar with other panel datasets.

The most important improvements in the new data istribution are listed in the following:

1. New Datasets

1.1 Dataset BIOAGE06
The new data distribution contains the new file BIOAGE06. For the first time in 2008, it includes the information collected using the special motherchild questionnaire, usually from mothers of fiveto- six-year-old pre-schoolers. The data are thus on a birth cohort that was first “surveyed” in the year 2002/2003 with a special Newborn Questionnaire. The new data on pre-school-age children contains children’s height and weight, health, care situation, activities with and without the mother, and media usage. Detailed questions address the care situation. Furthermore, valid information on the child’s personality is collected for the first time (based on the “Big Five” personal traits indicator in the main questionnaire for adults) and their socio-emotional behavior (surveyed with a modified version of the Strength and Difficulties Questionnaire).

1.2 Dataset MOVEDIST
We provide a new dataset on the change of residence. Based on the geo coordinates at block level we will provide the information about the distance (in meter) between the former and the present residence. However the information will only be available for moves since 2000 and is NOT available on this DVD! We distribute this data together with data on the spatial planning regions (ROR) on an extra CD-ROM. You need an extended data distribution contract including a data protection concept if you want to use this kind of data. After signing your contract extension, you will receive this data on CD-ROM (at no additional cost).

2 New Variables 

2.1 Dataset PPFAD

  • MIGBACK / MIGINFO: MIGBACK provides time-invariant information on an individual’s migration background resulting from own and parental data. MIGINFO indicates the sources of the information used in order to provide users with highest possible transparency. A detailed description is available in the extensive biography documentation (see chapter on PPFAD).


2.2 Dataset PFLEGE

  • Pay / Stufe: two new variables about paid care (PAY) and the care level (STUFE) accordingly to the German compulsory long term care insurance.


2.3 Dataset PBIOSPE

The data generation process has been updated completely but without changing the basic principles. Therefore, there are only a few barely discernible deviations in the main variables (due to slight changes in the consistency checks of the data). But there are a number of visible changes in the form of additional variables or additional values in already existing variables. A detailed description is available in our documentation on biography and life history data.

2.4 Dataset BIOPAREN

  • BIO: origin of information is $LELA or $JUGEND
  • ALTER / VALTER/ MALTER: age of respondent/ father / mother -all at the time of biography interview.
  • Attention: A bug had been discovered in the dataset shortly after completing the DVD. For updating the information about the parental religious affiliation, please see our site Known Bugs/Fixes.


3 Revised Variables

3.1 Dataset PWEALTH and HWEALTH
In the year 2007, all individuals aged 17 and up were again surveyed on wealth, just as they were for the first time in 2002. These “raw” data were already part of the standard data distribution for Wave 24 and will be distributed with the upcoming data distribution in a file containing the data for 2002 and 2007 in “long format” – the file PWEALTH for individual data, HWEALTH with data aggregated according to household context. Missing values due to item or partial unit non-response (e.g., missing interviews with individual household members in interviewed households) will be subjected to multiple imputations in complex procedures taking longitudinal information into account. Documentation on this is under preparation. An initial analysis of the new wealth data for 2002 and 2007 is provided in: Joachim R. Frick und Markus M. Grabka. 2009. Wealth Inequality on the Rise in Germany. Weekly Report 5 (10), 62-73 | PDF, 383.22 KB .

3.2 Dataset $PEQUIV


3.3 Dataset HHRF/PHRF


3.4 Dataset $PGEN

  • EMPLST$$: A new category has been added to this variable ("Employment status") From 1998 on, the SOEP data contains information on working in a sheltered workshop for the disabled. Since these persons do not provide information on whether they work full-time, parttime, or on an irregular basis, the new category "sheltered workshop" has been included.


3.5 Dataset $HGEN
The domicile-related variables in the wave-specific $HGEN files have been completely revised. New additions include the full imputation of missing values (due to item-non-response) for the housingrelated variables number of rooms, heating costs, gross rent excluding heating, as well as the newly generated variable on utility costs in addition to rent. Finally, “flag variables” show the imputation status, if relevant. Experienced SOEP users may also note the change of the various variable names in the file $HGEN.


3.6 Dataset PPFAD

  • TODJAHR / TODINFO: To separate panel mortality from demographic reasons for dropping out from the SOEP sample, TNS Infratest carried out several studies to determine the current residence of panel dropouts, i.e. earlier respondents who no longer take part in the SOEP. This entailed locating 17,195 persons. These investigations allowed 981 cases to be identified in which the dropout had died. However, until 2008 all in all 3791 deaths have been identified in the SOEP (see also the documentation on the variables TODJAHR and TODINFO in the file PPFAD). Additionally, there exists a documentation in German language from our fieldwork organization TNS Infratest (“Wiederbefragung von Panelausfällen | PDF, 368.88 KB ” and an English language summary | PDF, 36.18 KB ).

Data distribution 2007 (Wave X)

The 2008 data distribution (1984-2007) provides, for the year 2007, the usual wave-specific data XPBRUTTO, XP, XPKAL, XPGEN, XHBRUTTO, XH, XHGEN, XKIND and WPLUECKE as well as the updated files with a longitudinal component (PFAD files, biography files, spell data and weighting factors).

In the survey year 2006, a representative supplementary sample for all of Germany was added: refreshment sample H. Biographical background information has been collected from respondents in sample H for the first time in 2007. This data has been fully integrated into alle relevant biography files (BIOxxxx).

As part of the SOEP innovations projects TNS Infratest Sozialforschung conducted in December 2006 a postal survey among former SOEP panel members from households which had been classified as final refusals in 2001-2004. As a byproduct we could change the information on year of birth from missing to a valid value for 21 of these persons (more information can be found in the executive summary | PDF, 36.18 KB executive summary of the TNS Infratest Methodenbericht).

Furthermore the following additions and modifications have been made:

A. New and Renamed Datasets

COGNIT06:
In the 2006 survey year, for the first time, short cognitive tests were carried out with a subsample of the SOEP. The goal was to employ a robust set of instruments that could be administered easily by trained interviewers in just a few minutes. Close to 80% of all persons chosen for participation in the cognitive test provided valid answers. Thus, for the first time, the SOEP now contains indicators of cognitive potentials for more than 5,500 persons, along with diverse educational information based on degrees and certifications. It is planned that the first repeat of the test will take place in the 2010 survey year. A detailed documentation and selection analyses can be found in Schupp et al. (2008) Erfassung kognitiver Leistungspotentiale Erwachsener im Sozio-oekonomischen Panel (SOEP), DIW Berlin, Data Documentation 32 | PDF, 447.63 KB .


PBR_EXIT and PBR_HHCH:
These two datasets replace the former dataset YPBRUTTO, however this year both variants are available 

MIHINC:
Multiple imputed dataset on monthly net household income for the years 1996 to 2007. The dataset is stored in long format (long format: hhnrakt, svyyear, mj, also called mim format within stata). Each item non-response on net household income was imputed 10 times. More information can be found in HGEN.pdf | PDF, 0.64 MB

B. New Variables 

B.1 Dataset XPBRUTTO

  • XEWSTATU: Proxy information on non-responding persons regarding their labor force status in households with partial unit non-response.

 

B.2 Dataset $PEQUIV

  • P11101$$: Copy of the wave specific variables on overall life satisfaction.

B.3 Dataset $HGEN

  • I_HINC$$: Multiple imputed version of HINC$$, the monthly net household income. Imputations 1-5 are available as wide format in $HGEN (only 1996-2007), all generated imputations (10) are available in an extra dataset called MIHINC in long format, additional information can be found in HGEN.pdf ( document,169 KB).
  • FHINC$$: Imputation flag for I_HINC$$, 0 means not imputed and 1 otherwise.
  •  

C. Revised Variables

C.1 In the Dataset $PKAL

  • $P2D03 + $P2E03: In the waves U-W (years 2004-2006) for some cases an incorrect "does not apply" missing (-2) was corrected to an "no answer" missing (-1).

C.2 In the Dataset HHRF/PHRF

  • WPHRF*: All weighting factor for the year 2006 are now based on microcensus benchmark data from 2006.

    However, the weighting factors for the year 2007are also based on (newest available) microcensus benchmark data from 2006; they are therefore only provisional with regard to the figures given for households and individuals in Germany.

  • VHHRF + VHHRF1: 1 Household from sample G was corrected and set to 0.

 

C.3 In the Dataset $PGEN

  • LFS$$: The variable „labor force status" has been improved over all waves with respect to the accuracy of classifying individuals as „non-working and older than 65"(category 2). Now, the information on the month of birth of a person is used in order to determine whether the person was older than 65 at the time of the interview.

 

D. Error Updates

D.1 In the Dataset VH and WH

  • We have corrected value labels for the variables indicating the owner of the dwelling (VH27 and WH27), please note the relevant corrections in the table below.

    Variable Label: Owner Of The Dwelling

                                                                                         
    ValueWrongCorrect
    -2 Does not apply Does not apply
    -1 No answer No answer
    1 Self Owned Res. Property Local Govt. Apt.
    2 Local Govt. Apt. Co-Operative Apt.
    3 Co-Operative Apt. Company Apt.
    4 Company Apt. Private Owner
    5 Private Owner Do Not Know

D.2 In the dataset $PGEN

  • EGP$$: The variable "Erikson and Goldthorpe Class Category" (international socio-economic index of occupational status) has been corrected with respect to the assignment of individuals to category (18) "not working - pensioner". Up to now, all pension recipients, i.e. recipients of retirement pension and recipients of widow's/orphan's pension have been erroneously classified as "not working - pensioner" if none of the other categories applied. In the corrected generation of the EGP$$ variable, which applies to all waves, non-working persons are only assigned to this category if they are recipients of a retirement pension or if they are recipients of orphan's/widow's pension AND are older than 60 years. Moreover, if there is missing information on pension receipt, additional information from ARTKALEN (retrospective information from the activity calendar for the previous year) is used in the generation process to determine if a person was in retirement or early retirement ("Vorruhestand") at the time of the interview. All other non-working persons are assigned to category (-2) "does not apply" as long as they are not registered as unemployed (category 15).
  • STIB$$: The same problem of misclassification of individuals to the category "pensioner" (13) applied to the variable for the "Occupational position", and has been corrected for all waves in the same way as for EGP$$ .
  • NACE$$: The variable for the "two-digit NACE Industry - Sector" had several inconsistencies with respect to the labeling. In particular, the labels for code (90) "Sewage And Refuse Disposal, Sanitation And Related" and code (95) "Private Households With Employed Persons" had to be swapped. Some other labels were not accurate, and have been stated more precisely for all waves.
  • IS88$$, ISEI$$, MPS$$, SIOPS$$, KLAS$$, EGP$$: The questions which refer to these variables are not asked from all employed persons annually. In the survey years 1985, 1986, 1987, 1988, 1990 (West), 1992 (West), 1994, 1996, 1999, 2001, 2003, 2005, and 2006 only those employed persons who changed jobs and first-time respondents are asked to provide up-to-date information. Hence, in years with a partial survey, these variables should contain the available previous year's information for all employed persons without a job change who did not update the information on their current occupation. However, for some individuals, the previous year's data was not used by mistake. This mistake was corrected by newly generating these variables for all the waves in an accurate and consistent way.

Data distribution 2006 (Wave W)

The 2007 data distribution (1984-2006) provides, for the year 2006, the usual wave-specific data WPBRUTTO, WP, WPKAL, WPGEN, WHBRUTTO, WH, WHGEN, WKIND and VPLUECKE as well as the updated files with a longitudinal component (PFAD files, biography files, spell data and weighting factors).

In the survey year 2006, a representative supplementary sample for all of Germany was added: refreshment sample H. Detailed information on the integration of this sample and additional changes in both files using weighting and extrapolation factors can be found below (see item 2).

A further important change is the introduction of a new survey instrument for first-time respondents at the age of 17 years. These persons now receive an expanded youth questionnaire, which provides current information as a supplement to the biographical data already collected, thus rendering the previous individual questionnaire used for this group obsolete. This also means that the survey population for the standard individual questionnaire (stored in the files $P) changed slightly, 17-year-olds are not included since survey year 2006 (first-time surveying of sample H constitutes an exception, since here the 17-year-olds have been surveyed again with the individual questionnaire since the biographical survey in new subsamples starts only with the second wave. The revised $NETTO-variables and the file $PAGE17 are of interest in this context (see below).

The educational variables in the generated datasets ($PGEN) have been revised: the integration of vocational qualifications attained abroad has been improved, and the corresponding variables have been subjected to extensive testing for consistency. These variables will be described in greater detail below.

The information on twins in SOEP was validated by a special survey of "potential" twins, and is integrated into the dataset BIOTWIN.

This year as in all previous years, the variables contained in the file WPEQUIV (wave 2006) relating to previous year's income take into account the various structural changes in the tax and transfer system, using these as part of the basic informational framework for generating and simulating annual income. Not only do the changes in the 2005 tax rate (reduction of the top tax rate, personal exemption) play an important role here but also the new guidelines contained in the Old Age Income Act (Alterseinkünftegesetz). The introduction of Unemployment Benefit II (Arbeitslosengeld II) also plays an important role, along with the extensive changes in the transfer system it entails (Social Security, Rent Subsidy, etc.). The generated information on (previous) year's income from SOEP survey year 2006 has thus been subject to thorough testing for internal and external consistency.

This year, the data is being distributed for the first time on DVD. This means that the language of variable and value labels can be chosen even more easily: right in the SOEP data installation program. If you install the data in Windows Vista using our setup program, please follow the installation instructions on DVD.

The following additions and modifications have been made:

New and Renamed Datasets 2006  

$PAGE17
From 2007 on, persons who have reached the age of their first individual SOEP interview (17 years) are not given the usual individual questionnaire but a special youth questionnaire. Wave-specific information not contained in the biographical data or other generated datasets (like $PGEN, HEALTH) are given in the dataset $PAGE17. Youth questionnaire respondents are identifiable with the help of the new $NETTO code "17" (see also the changes in the $NETTO variables in PPFAD). More information can be found in the biography documentation on our homepage and on the new DVD.

DESIGN
Starting in 2007, the information on SOEP sample design previously compiled in the dataset VARIANZ (Spiess 2001) is now being disseminated in a revised and amended dataset DESIGN. Preliminary documentation can be found in designdoku.pdf | PDF, 57.99 KB on our homepage and on the new DVD.

HEALTH
Starting with 2002, the SOEP health module in the individual questionnaire has been revised and put on a two-year replication period. In the HEALTH file, users find the generated SF-12 variables (measuring health related quality of life) as well as variables on height and weight with imputation flags and a user-friendly, longitudinally checked generated variable of the Body Mass Index (BMI). More information in the health.pdf | PDF, 110.01 KB on the SOEP homepage or on the new DVD.

PWEALTH and WEALTH
The wealth data asked in 2002 were thoroughly revised and checked for inconsistencies. The data are now provided in two (multiply) imputed datasets for the individual and the household level, with the corresponding flag variables for identification of the imputed values. The two datasets also each contain a generated variable on "net wealth" (see SOEPpapers No. 18 | PDF, 0.76 MB ).

Interviewer Survey
The interviewer dataset-available up to 2006 only as a "stand-alone" version-is now integrated into the standard data distribution under the name INTVIEW and thus provided in the different software formats (SAS, SPSS, STATA).

Cross-Sectional Weighting Scheme 2006  

With the 2006 data distribution, important changes have been made in the cross-sectional weights. They are described in detail (in German) in the DIW Data Documentation 22 | PDF, 310.7 KB .

1. Types of Weighting Factors Redefined
Each cross-sectional weight is designated $xHRFy. Here, $ represents the wave identifier, x the differentiation between households (x = H) and persons (x = P) and y an additional identifier that describes the type of weighting factor.

  • $xHRF are the weighting factors that have been used since the beginning. They contain all samples with the exception of high-income sample G.
  • $xHRF1 are the standard weighting factors, where-additionally to the exclusion of sample G-the weights of new subsamples have been set to zero. Why? Using a complex survey construct, respondents in the first waves showed "worse" answering behavior than respondents in later waves (for example regarding life satisfaction and annual income). sample C offers an exception: respondents in the former GDR in 1990 did not exhibit the typical problems of first-time respondents (that is, GxHRF and GxHRF1 are identical).
    For standard cross-sectional analyses, we recommend the use of the $xHRF1 as a standard weighting factor. In this way, the information from the first waves of the different subsamples is automatically left out.
  • $xHRFALL include all available samples.
  • $xHRFD, $xHRFF and $xHRFG designate the isolated weights for immigrant sample D, for refreshment sample F and for high-income sample G.
  • The Variable $PHRFXX in PHRF und HHRF has been deleted.

2. Modifications to the External Information Used in the Weighting Scheme
Since the year 2005, the Federal Statistical Office no longer provides data differentiating between East and West Berlin. This has led to the minor retrospective changes in the external information on number of households since survey year 2005.

3. New Refreshment Sample H
In 2006, SOEP expanded to include another sample-refreshment sample H. The new households, which are representative for Germany as a whole, were also included in the weighting scheme. The integration of sample H is currently underway. Tests are still being conducted to determine if and how sample H can be further adapted to the external information. This is not a serious problem since in any case, for descriptive analyses we recommend the use of the weighting factors WxHRF1, which exclude sample H .

4. Weighting Factors are Based on Benchmark Data from the 2005 Microcensus
The weighting factors for the year 2006 are based on microcensus benchmark data from 2005; they are therefore only provisional with regard to the figures given for households and individuals in Germany. Please address any questions to .

BIOAGE01 and BIOAGE17 2006  

1. BIOAGE01
Four new variables on pregnancy status have been generated, based essentially on the month of the interview from $P and the month and year of the child's birth, as well as the duration of pregnancy in weeks from BIOAGE01.

BCPREGY 'Mother: pregnant at the time of individual interview wave ($)?'
Value Labels:
2002 | Pregnant at Time of Personal Interview 2002
2003 | Pregnant at Time of Personal Interview 2003
2004 | Pregnant at Time of Personal Interview 2004
2005 | Pregnant at Time of Personal Interview 2005
2006 | Pregnant at Time of Personal Interview 2006
2007 | Pregnant at Time of Personal Interview 2007

BCPREGMO 'Mother: estimated month of pregnancy at the time of individual interview, wave($)'
Value Labels:
1 | First Month of Pregnancy
2 | Second Month of Pregnancy
3 | Third Month of Pregnancy
4 | Fourth Month of Pregnancy
5 | Fifth Month of Pregnancy
6 | Sixth Month of Pregnancy
7 | Seventh Month of Pregnancy
8 | Eighth Month of Pregnancy
9 | Ninth Month of Pregnancy
10 | Last Month of Pregnancy or after Birth

Furthermore the beginning and end of pregnancy are also available as spell data. Analogously to BIOMARSM, for example, we start counting with month 1 (January 1983), such that December 2007 is month 300. The data are generated based on month of birth and duration of pregnancy in weeks from BIOAGE01.

PREGBEGM 'Spell - Month beginning of pregnancy / conception (1 = Jan 1983)'

PREGENDM 'Spell - Month end of pregnancy / Birth (1 = Jan 1983)'

2. BIOAGE17
You will find detailled information on the structure and the content of the dataset in the documentation of the biographical data on our SOEP homepage or the DVD.

$HGEN 2006

NUTS1$$
In addition to the Bundesland (federal state) variable, starting this year, the corresponding NUTS (Nomenclature of Territorial Units for Statistics) Level 1 Variable is also provided. This variable is generally identical with $BULA in $HBRUTTO but without pooling Rheinland-Pfalz/Saarland (from 2000 on) and without differentiating between East and West Berlin.

$PGEN 2006 

1. New Variables
JOBCH$$

A variable for identification of job change was generated to supplement ERWTYP$$ (and eventually to replace it). The categories for this variable are independent of whether the information was obtained in a first-time or a subsequent interview. For respondents to a subsequent interview, JOBCH$$ refers to job changes since the last interview and for first-time respondents, it refers to job changes since the beginning of the previous year. Respondents who started their first job and respondents who made a job change are reported separately. In contrast to ERWTYP$$, JOBCH$$ has been subjected to a check for longitudinal consistency. Cases showing inconsistences-such as duplicate entries of the same job change in two subsequent interviews-have been corrected.
Value Labels:
1 | Not Employed
2 | Employed No Change
3 | Employed No Info If Change
4 | Employed With Change
5 | First- Time Employed

2. Revised Variables
GERWZEIT, HERWZEIT
For the years 1990 and 1991, values for job tenure are now provided for sample C (East) as well. Given the potentially limited comparability due to the East German transformation process, this data should be handled with particular care.

$ERWZEIT
Job tenure has been tested for longitudinal consistency due to repeated evidence of inconsistencies. Cases that proved longitudinally inconsistent were corrected using the following procedure:

  1. Start of employment at current job as stated in the respondent's first survey is generally given precedence, and is carried on in subsequent years if no change of job occurred or the respondent did not take a new job after a break in employment.
  2. In the case of a change of job (change of employer / change to self-employment) current data on the time of job change is used and carried on in subsequent years.
  3. In the case where a respondent has taken up a new job after a break in employment, we assume that he or she returned to the old employer if the current data show a start of employment prior to the last survey year. In this case, we do not use the start of employment provided in the current survey but the start of employment from the last survey. If the current data show a start of employment since the last survey year, however, we assume that the respondent changed employer since the previous survey, and update the start of employment using the data from the current survey.

From the longitudinally consistent start of employment with current employer, we determine the duration of job tenure. When a respondent who started working again after a break can be assumed to have returned to his or her former employer, the full duration of job tenure is taken. The period of the break in employment is then not subtracted, potentially resulting in an implicit overestimation of firm-specific human capital.

AUSB$$
Since 1999, the required job training variable has distinguished between studies at universities and technical colleges, and now, different categories have also been created for the years prior to and since 1999. For the years since 1999, separate categories have been introduced explicitly differentiating among these different kinds of educational qualifications. Furthermore, technical colleges and technical schools are now designated separately.
AUSB$$ 'required job training'
Value Labels:
1 | No Training
2 | Introduction to Job
3 | On-The-Job Training
4 | Courses
5 | Vocational Training
6 | Technical School, Engineering (East) 1990-96
7 | Technical College or University, up to 1998
8 | Technical College, since 1999
9 | University, since 1999

MPS$$
For waves U,V, and W, values for Wegener's Magnitude Prestige Scale have been added for respondents without a household interview ($NETTO=19).

ERWTYP$$
For the employment type variable, the old categories have been maintained but due to the common value for first-time job holders and those who have made a job change, the label for this category has been changed. Thus, the label 'employed, with change or first time employed' is now applied to the value 6.


3. Update Educational Variables
Thanks to our users, an error was identified in the generation of the educational variables in $PGEN, which had crept in some time ago in the process of retrospective generation for the years 2000 and 2001 and continued on since then. The error was in the variable $PBBIL02, and consisted in assigning foreign university degrees too high a value. The error came about through the integration of the variables $PBBILA and $PBBIL02 in these two years. All educational degrees have therefore now been generated again retrospectively for the years 2000 to 2006. The resulting variables $BILZEIT, ISCED$$ and CASMIN$$ have also been updated retrospectively from 2000 on.

PPFAD 2006 

Revision of the $NETTO Codes

$NETTO
With this year's wave W (23rd survey wave), 2006, the compilation of data on the survey population has changed fundamentally. Previously, an individual interview was carried out with all household members above the age of 16. As of 2006, the regular individual interviews based on the standard adult questionnaire are introduced one year later when household members reach the age of 18. Seventeen-year-olds instead receive an expanded youth questionnaire in their first year as SOEP respondents. (This applies to the old samples A-G; for the new sample H, distribution of this youth questionnaire will start next year, while this year's 17-year-olds have received the regular individual questionnaire, in line with the old system).
This means that we now have two instruments instead of one to obtain data on respondents: the individual and the youth questionnaire. To ensure a consistent differentiation over time, it will therefore be necessary either to include the youth population of the current year or to increase the age limit for all previous years.
The newly revised $NETTO variable assists retrospectively in both differentiations for the entire survey period. The connection between survey population and survey instrument can be retraced with the help of the variable $NETTO in PPFAD or $HNETTO in HPFAD. As a result of the change in the survey population as well as the expansion of the survey instrument to include detailed information on biographical contexts, the corresponding variable $NETTO in PPFAD has been fundamentally revised and is now provided as a two-digit variable. To ease the transition to the new variable, the old one-digit variable is still provided as well under a different name $NETOLD; the variable $HNETTO in HPFAD is unaffected by this and remains unchanged.
Value Labels:
10 | Respondent Completed Interview
11 | Individual Questionnaire
12 | Individual Questionnaire and Biography
13 | Individual and Youth Questionnaire
14 | Individual and other Questionnaires
15 | Individual Questionnaireand Experiments, Tests
16 | Individual Questionnaire, First-Time Respondent, Age 17
17 | Youth Questionnaire, First-Time Respondent, Age 17
19 | Individual Questionnaire without Household Interview

20 | Children in Household Interviewed ($KIND)
21 | Children with Mother-Child Questionnaire I, Age 0-1
22 | Children with Mother-Child Questionnaire II, Age 2-3

30 | Persons in successfully interviewed household without Individual Interview
31 | Completed Gap Interview ($LUECKE)
32 | Completed Biography Questionnaire
33 | Successful Youth Questionnaire
34 | Successful Tests and Experiments

60 | Only Questionnaire without Individual or Household Interview
61 | Gap Interview without household reference
62 | Gap Interview with drop out
70 | Only Participation in Tests, Experiments, etc.

80 | Individual did not withdraw from panel population
81 | Previous respondent lacking current information
89 | Repatriate - (was Drop Out)

90 | Individual Dropouts $YPBRUTTO
91 | Moved abroad
99 | Died

$NETOLD
In the new variable $NETOLD, the old $NETTO code can still be used. Persons at the age of 17 who either filled out a youth questionnaire (n=307) or an individual questionnaire (sample H, n=31) are each coded with the value of 1. In this way, the selection on (WNETTO == 1 | WNETTO == 5) is not identical with the population in WP.

$PEQUIV 2006 

1. New Variables
ALG2$$: Sum of all transfers from Unemployment Benefit II (Arbeitslosengeld II) received by the household. FALG2$$: Flag to identify the imputation of Unemployment Benefit II (ALG2$$).
IDEMY$$: Sum of indemnity payments received in the previous year.
FDEMY$$: Flag to identify the imputation of indemnity payments (IDEMY$$).
ITRAY$$: Sum of commuting and travel grants received in the previous year.
FTRAY$$: Flag to identify the imputation of commuting and travel grants (ITRAY$$).

2. Revised Variables
I11105$$
The variable (rental value of personally used living space = imputed rent) has been generated up to now only for persons living in owner-occupied housing. In line with recent research findings and also European Community guidelines for the generation of imputed rent in EU-SILC, this fictitious income advantage is now generated for persons in rental households as well who claim to pay below-market rental prices. These include people in rent-free housing, in socially subsidized housing, and in rental properties offered at a special rate (company dwellings, apartments provided by relatives at reduced rent, etc.).

W11101$$ and W11102$$
Due to the changes to the weighting factors in the files PHRF and HHRF, the variable W11101$$ now contains the individual weighting factor $PHRF1 (from the file PHRF) and the variable W11102$$ now contains the household weighting factor $HHRF1 (from the file HHRF).
First-time SOEP respondents show a significantly higher rate of item non-response in their first wave, which cannot be corrected adequately through imputation. For this reason, these two weights do not take into account the first wave of each new SOEP subsample. Furthermore, high-income subsample G has been excluded from the weighting scheme in order to prevent structural breaks in the analysis of income with vs. without this subsample. These two weighting variables are thus particularly well suited to a consistent time series of income inequality analysis.

W11105$$
The variable W11105$$ now contains the individual weighting factor $PHRFALL (from the file PHRF). This weighting variable takes into account all SOEP subsamples.

E11105$$
The content of the variable E11105$$ is now based on the ISCO88 International Standard Classification of Occupations.

E11106$$ and E111076$$
The variables E11106$$ and E11107$$ now provide information on sector affiliation in the form of a one or two-digit number according to the NACE scheme, the International Standard Industrial Classification of all Economic Activities.

3. Deleted Variable
W11106$$ 'HH-Weight immigrant sample'

BIOBIRTH; BIOBRTHM 

KIDMON[n]
With wave W, the birth biographies of men (BIOBRTHM)-like those of women (BIOBIRTH)-include not only the year of birth (KIDGEB[n], with n = (1...15), but also the month of birth for each child (KIDMON[n]). This birth month is identical with the child's birth month given in PPFAD.  

BIOTWIN 

In 2006, a separate survey was carried out in all households with twins. This twin survey had the goal of validating the data on all twins in SOEP and gaining new information. The following variables have been changed or added in BIOTWIN as a result:

BIOMONOZ
The variable BIOMONOZ differentiates between identical and fraternal twins based on a question asked to first-time respondents. This information used to be obtained through a question asking whether the twins were of the same or different sexes. New codes have been introduced for the variable BIOMONOZ to reflect the improved information available. The values are thus no longer compatible with those from prior to wave W contained in variable BIOMONOZ in the dataset BIOTWIN.

INFOTWIN
The variable INFOTWIN has been introduced. This variable tells whether information on twins was given in the 2006 twin survey, whether the information was derived from previously exisiting SOEP data, and whether previously existing data on the twins coincides with the results of the twin survey.

EGP$$  

The variable "Erikson and Goldthorpe Class Category" (International Socio-Economic Index of Occupational Status) has been corrected with regard to the categorization of freelance academics, who were previously grouped together with the self-employed (values of 5 or 6). The corrected generation process assigns academic freelancers to the upper service class, which corresponds to a value of 1.

nach oben

Data distribution 2005 (Wave V)

The 2006 SOEP data distribution (1984-2005, Waves A-V) includes the usual wave-specific data VPBRUTTO, VP, VPKAL, VPGEN, VHBRUTTO, VH, VHGEN, VKIND and UPLUECKE, as well as updated versions of all datasets with a longitudinal component (spell data, biographical data, and weights).

The first CD-ROM contains, as usual, all SOEP data with variable labels and value labels in German, and the second contains all SOEP data with variable labels and value labels in English.

Please also note the following improvements and changes:

New and renamed datasets 2005 

With the current data distribution, we renamed all SOEP datasets based on age-specific biographical questionnaires (e.g., "Mother and Child") in a more consistent manner. Since all these datasets are saved in long format, the names now start with "BIOAGE" and a two-digit suffix. This suffix gives the maximum age of the individuals in question during the survey year.

BIOAGE01
New name for the dataset BIOCHILD up to the present (based on the questionnaire for mothers with a newborn child below the age of 15 months).

BIOAGE03
New dataset based on mother-and-child questionnaire for mothers with a child between the ages of 2 and 3 years. For further information, please see the biographical data documentation.

BIOAGE17
New name for the dataset previously known as BIOYOUTH (based on a survey of adolescents between 16 and 17 years old).

Weighting 2005

The 2005 cross-sectional weights are provisional - an update of VPHRF and VHHRF will be released in fall 2006

The wave-specific projection and weighting variables will be adjusted annually to external official data to ensure the accuracy of marginal distributions on age, sex, household size and nationality. The source of the data is the German Federal Statistical Office's official microcensus. From 2005 on, the data on Berlin will no longer be reported separately for the areas comprising former West Berlin / East Berlin; rather, Berlin will be considered part of East Germany. As a consequence, the data required to adjust our weights to the official marginal distributions will not be available before fall 2006.

To prevent this from causing a delay in the distribution of the SOEP data up to Wave V (2005), the weights (VPHRF* and VHHRF*) have been adjusted to the data used for Wave U (2004).

From our experience, there is a very low deviation in the benchmark data over the years (the new definition for West Berlin / East Berlin being one exception). Please keep in mind the provisional nature of the weighting scheme, and indicate this explicitly in any publications using the weights for Wave V. We will inform you as soon as the final version, based on the 2005 microcensus data, becomes available via the SOEP NEWSLETTER and listserver.

$HGEN 2005 

AHINC$$
The adjusted screener (AHINC$$) is now available for all waves (Exception: Sample C in 1990/1991).  

$PGEN 2005 

ALLBET$$ (new)
Raw categories for the size of the company. A consistent variable over all waves for the size of the company ("least common denominator" of the variable BETR$$).

Categories:

  1.  "less than 20"
  2. "20 to 200"
  3. "200 to 2000"
  4. "2000 and above"
  5. "Self-employed with no other employees"

BETR$$ (revised):

The variable BETR$$ now has eleven instead of nine categories. The reason is the more detailed questions from Wave V onwards. The old category "5 to 20 employees" is now split into two categories ("5 to 10 employees" and "11 to 20 employees").

The new categories are:

  1. "less than 5"
  2. "GE 5 LE 10"
  3. "11 LT 20"
  4. "up to 1990: LT 20"
  5. "1991-2004: 5 LT 20"
  6. "GE 20 LT 100"
  7. "GE 100 LT 200"
  8. "up to 1998: GE 20 LT 200"
  9. "GE 200 LT 2000"
  10. "GE 2000"
  11. "Self-employed without employees"

TIP: The variable ALLBET$$ in the dataset $PGEN offers consistent data on company size thoughout all waves of the SOEP, although with fewer categories in a less detailed classification.

EMPLST$$ (new):
Employment Status. A consistent variable over all waves to differentiate employment status (in addition to the variable LFS$$, which differentiates non-employed persons).

Categories:

  1. "Full-time employment"
  2. "Regular part-time employment"
  3. "Vocational training"
  4. "Marginal, irregular part-time employment"
  5. "Not employed"

EXPFT$$ (new):
Working experience full-time employment. Coverage of complete working experience in full-time employment (in years, one digit after the decimal point).

EXPPT$$ (new):
Working experience part-time employment. Coverage of complete working experience in part-time employment (in years, one digit after the decimal point).

EXPUE$$ (new):
Unemployment experience. Coverage of unemployment experience throughout the entire period of working life (in years, one digit after the decimal point).

Contact:

$PEQUIV 2005  

SSOLD$$ (new):
Social assistance to the elderly ("Grundsicherung im Alter").

FSSOLD$$ (new):
Imputation flag: Social assistance to the elderly.

LOSSR$$ (new):
Losses from renting and leasing.

FLOSSR$$ (new):
Imputation flag: losses from renting and leasing.

LOSSC$$ (new):
Losses from capital investment.

FLOSSC$$ (new):
Imputation flag: losses from capital investment.

D11112LL (new):
Race of individual

D11110$$ (erased):
data already included in the variables M11124$$.

D11111$$ (erased):
data already included in the variables M11125$$ .

Contact:

Bug fixes  

Correction of [T-U]HPOP in HPFAD.
Correction of some individual and household weights for the years 2003 and 2004 (THHRF, UPHRF, and UHHRF). 

nach oben

Data Distribution 2004 (Wave U)

PPFAD 2004

LOC1989
The basic demographic information in PPFAD has been expanded to include location of residence in 1989, i.e., where an individual lived when the Berlin wall fell (variable LOC1989). This information is differentiated into the categories "East Germany", "West Germany", and "Abroad" and is available for all respondents (adults and children, see further documentation in Biography and Life History Data).
Contact:

PGEN 2004 

LABGRO$$ and LABNET$$
New variables have been generated for all waves (A-U) providing information on monthly gross and net labor income (LABGRO$$ and LABNET$$), consistently declared in euro. Missing values in case of item non-response are imputed as indicated by the corresponding imputation flag variables IMPGRO$$ and IMPNET$$ respectively (see also additional documentation in PGEN.PDF | PDF, 232.61 KB ).
Contact:

HGEN 2004  

HINC$$
$HGEN now includes the monthly net household income consistently named (HINC$$) and declared in euro over all waves (A-U).
Contact: or

AHINC$$
A new variable has been generated for waves L-U (1995-2004) providing information on monthly net household income adjusted for possible underreporting (AHINC$$), also consistently declared in euro. Possible underreporting is checked with the help of the current individual incomes of all household members (see also additional documentation in HGEN.PDF | PDF, 61.43 KB ).
Contact: or

$PEQUIV or SOEP-CNEF 2004  

M11101$$-M11127$$
The files $PEQUIV now also include a set of cross-nationally harmonized health-related variables M11101$$-M11127$$ (see also the additional documentation in the Codebook for the $PEQUIV File 1984 - 2004 | PDF, 0.55 MB ).
Contact:

nach oben

Data Distribution 2003 (Wave T)

The data of the German SOEP (100% version) are distributed on three CD-ROMs covering the years 1984-2003. New data sets for the survey year 2003 are the usual wave-specific data TPBRUTTO, TP, TPKAL, TPGEN, THBRUTTO, TH, THGEN, TKIND and SPLUECKE. There are also updates of data sets with a longitudinal component (biographical data and weights). The information collected for the first time in 2003 in the biographical questionnaire for sample G ("high-income sample") has been completely integrated into the user-friendly biographical data sets.

As of this year, the data on CD-ROM #2 also contains all SOEP data with variable labels and value labels in English (including the data from the 1988 financial statement in file EV).

In addition, we have made the following additions and changes:

Sample G "High Income Sample" (Start 2002)  

The revised sampling design, using a higher income threshold, results in a smaller number of observations in wave 2.
Contact:

HHRF and PHRF 2003 

The standard weighting variables for waves S and T (SPHRF, TPHRF or SHHRF, THHRF) are based on sub-samples A-F, that is, without considering high-income sample G. In addition, we now offer a new integrated weighting variable for all sub-samples A-G (variables $PHRFAG or $HHRFAG, see also documentation | PDF, 267.43 KB on the integrated weights for A-G vs. A-F ).
Contact:

Rectypes 2003

1. BIOCHILD: Information from the 'Mother and Child Questionnaire'
In this new file, information on newborns in the SOEP will be collected each year from now on (see further documentation in Biography Data).
Contact:

2. BIORESID: Information on second residence in the first interview
The data set BIORESID includes information on length of residency, and on second residence. The information comes from the biographical questionnaire, which has consistently contained questions on this since 1994 (see further documentation in Biography Data).
Contact: Thorsten Schneider

3. BIOBRTHM: Birth biography information for men - from 2001 on
This new data set includes information on the birth biographies of men interviewed with this modified questionnaire since 2001. BIOBRTHM is structured analogously to BIOBIRTH, based on a question fomerly only answered by women (see further documentation in Biography Data).
Contact:

4. BIOTWIN: data for identifying births of twins, triplets, etc.
BIOTWIN includes all identifiable births of twins, triplets, etc. in the SOEP. Identifiers (PERSNR) for the mother and siblings are included (see further documentation in Biography Data).
Contact: and

5. HBRUTT98:
This new file contains the complete gross population of sample E in the year 1998. It is useful in attrition analysis of the first wave of this sample.
Contact:

BIOPAREN 2003

Variables on the nationalities of parents have been corrected (see further documentation in Biography Data).
Contact:

PGEN 2003  

MODE$$und MONTH$$
Two new variables have been generated for all previous waves to describe interview method and month (MODE$$ or MONTH$$. See also additional documentation | PDF, 0.66 MB .
Contact:

$PSBIL
Update of $PSBIL: For foreigners, the category "leave without graduating" [code 6] had to be updated in 2000, which in turn made it necessary to update $BILZEIT, ISCED$$ und CASMIN$$.
Contact: Bettina Isengard and

$FAMSTD
The variable for martial status has been updated.
Contact:

HGEN 2003  

HMODE$$ and HMONTH$$
Two new variables were generated for all previous waves to describe interview method and month (HMODE$$ or HMONTH$$). See also additional documentation | PDF, 0.64 MB .
Contact:

PPFAD 2003  

GEBMONAT
The central demographic information in PPFAD has been expanded to the month of birth (variable GEBMONAT). This information is now collected for all adults and children as well (see further documentation in Biography Data).
Contact:

Update of EINTRITT, ERSTBEFR, AUSTRITT, LETZTBEF (see further documentation | PDF, 0.53 MB ).
Contact:

BIOBIRTH 2003  

The information on women's birth biographies was expanded to include information from the Youth Questionnaire, which is given to 16-17 year-olds being interviewed for the first time instead of the standard biographical questionnaire (see further documentation in Biography Data).
Contact:  

BIOIMMIG 2003  

This data was corrected to fix a case of miscoding in past years that occurred due to a reversal of the item sequence. This applies to the variables BIEXPRLV, BIEXPRAC and BIEXPRAN (see further documentation in Biography Data).
Contact:

PFLEGE 2003

The new variable PNRCARE is now available for the years since 1999, that is, for waves P - T. PNRCARE is an invariable number identifying the primary caregiver in a household. In three cases, the person identified as caregiver was identical with the person being cared for. In these cases, PNRCARE was set at -3 (implausible value). For the waves prior to 1999, PNRCARE has been assigned the value -2.
Contact: Rainer Pischner 

YPBRUTTO 2003  

Revision of HHNRAKT and HHNROLD for persons listed doubly while living in a previous household.
Contact:  

$EQUIV 2003

All income data since 1984 is coded in EURO.

As a supplement to the annual income aggregates offered thus far, we now add the individual income components (sum of all income earned by all household members, variables I111xx$$) with consistent variable names over time.

All information missing due to item-non-response was imputed and marked using flag variables.

All income variables are also included for sample G, but standard weights were used on the basis of sub-samples A-F (see also the additional documentation | PDF, 41.18 KB ).
Contact:

 

nach oben

Data Distribution 2002 (Wave S)

Rectypes 2002

1. HBRUTT02
In addition to the continuous, wave-specific brutto information regarding progress in the field (SPBRUTTO, SHBRUTTO), households which were not surveyed have been included in the new subsample G for the file HBRUTT02. HBRUTT02 therefore contains all the households selected for subsample G; while the information on households who were surveyed for subsample G is also to be founding the continuous household-brutto SHBRUTTO. This matches the approach used for samples A (HBRUTT84), E (HBRUTT98) and F (HRBRUTT00).
Contact:

2. BIOSOC
The new data set BIOSOC contains youth information on everybody who has completed the biography questionnaire since 2000. This includes information such as arguments with parents, leisure activities, school grades and the federal state where they last attended school.
Contact: Thorsten Schneider

BIOJOB 2002

The data set BIOJOB contains detailed information on first jobs. As of now this also includes ISCO88 data, occupational scales, classification schemes (ISEI, SIOPS, EGP, MPS) as well as information about the sector (BRANCHE). Information regarding last jobs is a new addition and can be found in BIOJOB.
Contact: Thorsten Schneider 

BIOPAREN 2002 

The person to contact for the update of the Prestige-Scores for parents is .

PGEN 2002 

AUTONO$$
This new variable is based on the answers to 'Occupational Status' and represents the degree of autonomy in a person's occupation.
Contact:

STIB$$
This variable unifies the answers to 'Occupational Status' over all waves.
Contact:

ISCED$$, CASMIN$$
The wave specific files $PGEN have been retroactively (from 1984 onwards) expanded to include two further education variables ($ISCED and $CASMIN), which are respectively based on the international classification schemes ISCED (International Standard Classification of Education) and CASMIN (Comparative Analysis of Social Mobility in Industrial Nations). This will help improve comparisons of education-related analyses based SOEP data.
Contact: Bettina Isengard

$EQUIV 2002 

Compared to the last data set, there have been fundamental changes to the handling of Item-Nonresponse for annually-based income information and the aggregated income information contained in $PEQUIV. The established longitudinal procedure used for the imputation of Item-Nonresponse has been expanded to include a purely cross-sectional imputation for all income variables, which, however, are only to be used in the case of individual longitudinal information being unavailable. This has resulted in a complete replacement of all the missing income data in the $PEQUIV files (for further information for the methodical procedure for additional imputation cf. Frick, J.R. and Grabka, M. (2003): Missing Income Data in the GSOEP: Incidence, Imputation and its Impact on the Income Distribution | PDF, 1.01 MB ).

Due to this, all the so-called imputation flags have been revised. This now reproduces the share of the imputated income in the respective income aggregate, i.e. if all information is present the value will be 0 and if any Item-Nonresponses are present then the value may be anything up to 100.

In addition, complete income information for the new sample F for years from 2000 to 2002 now also available.

The CNEF data is not yet available for the first wave of sample G, as the methodically demanding imputation algorithms applied by the SOEP require longitudinal data.
Contact:

DM-EURO conversion

The income in $PEQUIV always refers to that of the previous year; this means that data collected in 2002 for the 2001 income year will still be in DM. There will be a conversion to Euros for all the $PEQUIV information in the next data distribution. Besides that, all the data contained in the $P files corresponds with the information collected with the original questionnaire, i.e. the data collected in Euros in 2002 or the data collected in DM in 2001 is respectively stored in the currency used in the questionnaire.
Contact:

 

nach oben

Data Distribution 2001 (Wave R)

With the 18th wave of the SOEP the concept for the construction of cross-sectional weights has slightly been changed. This change affects neither the derivation of the staying probabilities nor the construction of the weights for Subsample D units.

For more details, please see the Newsletter 60, April 2003.

With the current release of SOEP data (survey years 1984-2001), the coding frame for industry and occupation (first and second job) has been changed to the international standard of NACE and ISCO88, respectively. Especially „old friends" among SOEP-users should be aware that the variables ISCO$$, ISCOU$$, ISCOH$$ and $BRANCHE are no longer available. The respective new variables in the files $P and $PGEN are described in detail in the documentation of the generated variables at person-level (see file pgen.pdf | PDF, 0.66 MB ).

Other than that, the SOEP-group at DIW is currently fixing some minor bugs and deficiencies in the current data release. Firstly, the variables TODJAHR and TODINFO in the file PPFAD, which give year of death and the source of death information, will include all mortality information as given by a recent follow-up study („Verbleibstudie 2001") carried out by Infratest. Secondly, the variable $ERWZEIT in the file $PGEN will be updated so that there is valid information on the number of years with the current employer for all employed respondents in subsample C. Thirdly, the variable RP4002 in the file RP (occupational status: self-employed) and the variables RHHTAGIN, RHHMONIN, RHINTNR in RH (day and month of the interview as well as the interviewer‘s ID) had not been defined properly. All these problems will be fixed with the next release of data. However, users who need to use these variables should subscribe to our listserver so they will receive information about these updates sooner.

nach oben

Data Distribution 2000 (Wave Q)

Rectypes 2000

1. VARIANZ
In addition to the household indicator this file contains the variables STRAT1, STRAT2, SAMPOINT and INTNR. Some software packages (such as STATA, SUDAAN) are able to use these to estimate variances. All four variables provide information on the respective subsample for the start of each first wave, i.e. they are saved at the case-level (variable HHNR).
STRAT1 identifies the levels, which were relevant for pulling the Primary Sampling Units for the respective sample. For subsample B, these were the five nationalities. Therefore, "artificial" levels were created for subsample B corresponding to the other subsamples and filed under STRAT2.
The variable SAMPOINT identifies the respective PSU (e.g. in subsample A voting constituencies, in Subsample D not present).
Due to data protection laws the various values of the variables STRAT1, STRAT2 and SAMPOINT were given transformed values, in order to prevent regional units from being identified.
The variable INTNR is a variable to which every interviewer assigns a number, so that clusters of households that were surveyed by the same interviewer can be identified.

2. HBRUTT00
Similarly to the collection of the supplementary sample 1998 (sample E), this file contains all Brutto information from all households in the Innovation Sample in the year 2000 that were recently surveyed using the Random-Route-Method. In this case, it doesn't matter if these households were successfully surveyed or not. Information such as this can be accessed for the use of methodical investigations through the participation of households in (SOEP) surveys.

3. QJUGEND
In the year 2000, a youth questionnaire was introduced to be used instead of the biography questionnaire. This was aimed at all "new" participants who had reached the minimum age of 16 and were therefore able to take part in the SOEP survey. The 232 data sets that exist as of now supplement the information collected from the likewise first-time answering of the person questionnaire, in order to gain retrospective details on education, as well as basis indicators on education success. A thorough revision, as well as a supplementation of the youth questionnaire indicators took place in 2001, in addition to the fact that the youth participants of sample F took answered this new questionnaire for the first time. As a result, the data set QJUGEND represents, so to speak, a type of pre-test for the recently prepared biography data set BIOYOUTH (available from 2001 onwards).

Reworking of labels  

The VAR LABELS and VALUE LABELS have been be completely reworked for all previous years (up to and including 1999). Missing labels were included where applicable and the systematic was standardised (for instance for sub-items or variables with just one answer category). Furthermore, the labels were made consistent over time. At the same time the reworked label text was transferred to the English labels, so that these too were retrospectively fully identical to the German systematic.  

$PGEN 2000

For the current data distribution, extensive revisions were made to the variables from earlier waves. For instance, note that there are far fewer missing values -1 (k.A.) for many variables related to the occupations. The education variables in all $PGEN were reworked and supplemented. New variables include a differentiated labour force status for all participants and education information generated on the basis of data first collected in the year 2000 which dealt with the highest level of education and employment achieved up till now. The existing generated education variables were retrospectively reworked, extrapolated, as well as supplemented: you will now be able to access data on the temporarily absent respondents, as well as information on current school attendance, apprenticeship or studies. Furthermore the variable BETR$$ in $PGEN was recoded (the data on the size of the firm and therefore the codes in SOEP have changed over time). We would like you to take this into account when updating programs.
Contact: and

$PEQUIV 2000  

The $PEQUIV files were updated. This affects:

  • the extension of the population
  • the reworking of the variable IMPUTED RENT
  • new variables used to generate equivalence scales
  • a reworking of the variables related to ANNUAL WORKING HOURS

Contact:

nach oben

Data Distribution 1999 (Wave P)

Rectype 1999

INTERVIEW
This interviewer data set contains information about sex, age, education, occuaption and marital status from 1048 interviewers that work on sample A, B, C and D from survey wave 1 up to 12 (Documentation | PDF, 75.75 KB ).

For more information concerning the data distributions back to 1995 please refer to our German Site.