Skip to content!

SOEP-Core v28 - Changes in the Dataset

Änderungen am Datensatz

Dataset Information

1. New additional missing codes

With the integration of sample J in 2011, conducting of the biographical questionnaire was moved from the second to the first wave and combined with the individual questionnaire in an integrated survey. This means that there are some slight differences in the survey instrument between the old samples A-H and the supplementary sample J.

The following additional missing codes have been introduced to the survey data to document these possible differences:

-4 "Inadmissible multiple response"
-5 "Not included in this version of the questionnaire"
-6 "Version of questionnaire with modified filtering"

2. Sample I now part of our new Innovation Sample

The SOEP Innovation Sample has been launched now and includes, inter alia, sample I. Sample I is therefore no longer part of the main survey as of 2011. See SOEP-IS on our website for further information about the Innovation Sample and the possibility of including your own questions.

3. New and renamed datasets

BIOCOUPLM provides spell data on partnership histories from the first to last personal interview of a respondent. Spells are measured on a monthly basis.

BIOCOUPLY provides spell data on partnership histories. It contains annual information on partnership status since the respondent’s year of birth, including available retrospective data and annually updated information.

3.3 BIOSIB (beta version)
The new file BIOSIB provides information on siblings living in the SOEP households. The dataset contains the person numbers of all siblings in an observed family. It includes information on their gender, their year of birth, and on the relationship between the observed siblings.
BIOSIB is included as a beta version in the current data release. Please do not hesitate to send both positive and negative feedback or suggestions to Daniel Schnitzlein (

The BIOEDU dataset contains details on educational transitions beginning with entry into childcare up to tertiary education in a consistently structured form.

3.5 BIOAGE long
In the new integrated bioage long dataset (BIOAGEL), data are presented in “long” format, i.e. this dataset will contain information from BIOAGE01, BIOAGE03, BIOAGE06, as well as BIOAGE08a and BIOAGE08b.

Dataset on the Economic Behavior Experiment on Trust and Trustworthiness in the 2003, 2004, & 2005 SOEP Survey

This experiment to measure trust is based on the investment game introduced by Berg et al. (1995), a one-shot game for two players or movers who anonymously interact with each other. The first mover receives an endowment of 10 points and can transfer zero to ten points to the second mover. Every point that is transferred is doubled by the experimenters. The second mover is also given an endowment of ten points. After receiving points from the first mover, he/she decides on how much of the endowment to transfer back to the first mover (zero to ten points). As with the first mover's transfer, the back-transfer by the second mover is doubled by the experimenters. After the second mover's decision, the game ends and the subjects are paid their income in euros (one point equals one euro) by check sent a few days later.

A fundamental component of the game is that the participants actually receive money in accordance with the fixed payout function, i.e., all the decisions always have monetary consequences. This version of the game was developed by Fehr, Fischbacher, Schupp, von Rosenbladt & Wagner (2002).

The combination of representative survey and behavioral experiment was used in the SOEP main surveys in 2003, 2004, and 2005, with only minor modifications. Of the 1,432 original participants in 2003, 1,202 also took part in the experiment in 2004 and 2005.

The data are available in long format in the "TRUST" dataset. Consequently, this dataset contains information from each of the three waves in which the behavioral experiment was conducted.

Dataset on the Economic Behavior Experiment on Time Preferences in the 2006 SOEP Survey

In this experiment on economic behavior, respondents were asked to decide how they would like to receive €200 in prize money: if they would rather receive it immediately by check, or if they would prefer to wait and receive a larger amount later—that is, with interest. By splitting the sample (N = 1,503 persons) into random subsamples (splits), it was possible to vary both the time horizon and the implied interest rate to test possible incentive effects on the choice between a low payoff in the short term and a high payoff in the long term. The scientific director of the project was Prof. Dr. Armin Falk, CENs, University of Bonn.

4. New or revised variables

4.1 $HBRUTTO dataset

The $HBRUTTO dataset will include a new variable to distinguish between urban, suburban and rural regions. This is based on the spatial categories of counties (as of December 31, 2009) used by the Federal Institute for Research on Building, Urban Affairs and Spatial Development (BBSR). The following spatial structure characteristics are used to define the categories:

  • Share of county’s population in large or medium-sized cities
  • Population density of the county
  • Population density of the county without taking large or medium-sized cities into consideration

Thus, three categories can be defined:

  1. Urban regions (Cities with at least 100,000 inhabitants and counties with at least 50% of the population living in large or medium-sized cities and with a population density of at least 150 inhabitants/km²; and counties with a population density not including large or medium-sized cities of at least 150 inhabitants/km²)
  2. Regions undergoing urbanization (Counties with at least 50% of the population living in large or medium-sized cities but a population density of below 150 inhabitants/km², and counties with less than 50% of the population living in large or medium-sized cities, and with a population density (excluding large or medium-sized cities) of at least 100 inhabitants/km²)
  3. Rural regions (counties with less than 50% of the population living in large or medium-sized cities and population density (excluding large or medium-sized cities) of below 100 inhabitants/km²).


4.2 $PGEN dataset

BILZTCH$$ indicates whether the respondents’ answers suggest a downward shift in years of education or training ($BILZEIT) since the last observation or an upward change since the last year which is inconsistent with additional information on education or training recently completed.
is a flag variable which indicates whether the respondent showed some inconsistent change in $BILZEIT either upwards or downwards over the entire observation period.


To be consistent with the FID dataset, the missing values of the variables $VEBZEIT and $UEBSTD were slightly recoded, as the missing value –2 is now assigned to self-employed individuals. In previous waves, self-employed persons had the missing value –3 (implausible answer).

For $UEBSTD, the value –3 (implausible answer) is assigned to all individuals with more than ten hours of weekly overtime AND who also had an agreed working time of over 80 weekly hours ($VEBZEIT is implausible, value –3) or actual weekly working time of more than 80 hours a week ($TATZEIT is implausible, value –3).

4.3 BIOPAREN dataset

Seven new variables have been added to BIOPAREN:
VAORT11 and MAORT11 indicate the mother and father’s current place of residence.
provide information on siblings. The variable GESCHW indicates whether the respondent ever had any siblings at the time of the interview. GESCHWUP gives information about the year the sibling information was collected. NUMB and NUMS provides information on the number of brothers or sisters the respondent reports and TWIN indicates whether any of these are TWIN siblings (and of which type) of the respondent.