Skip to content!

SOEP-Core v22 - Changes in the Dataset

Änderungen am Datensatz

Dataset Information

The 2006 SOEP data distribution (1984-2005, Waves A-V) includes the usual wave-specific data VPBRUTTO, VP, VPKAL, VPGEN, VHBRUTTO, VH, VHGEN, VKIND and UPLUECKE, as well as updated versions of all datasets with a longitudinal component (spell data, biographical data, and weights).

The first CD-ROM contains, as usual, all SOEP data with variable labels and value labels in German, and the second contains all SOEP data with variable labels and value labels in English.

Please also note the following improvements and changes:

New and renamed datasets 2005 

With the current data distribution, we renamed all SOEP datasets based on age-specific biographical questionnaires (e.g., "Mother and Child") in a more consistent manner. Since all these datasets are saved in long format, the names now start with "BIOAGE" and a two-digit suffix. This suffix gives the maximum age of the individuals in question during the survey year.

New name for the dataset BIOCHILD up to the present (based on the questionnaire for mothers with a newborn child below the age of 15 months).

New dataset based on mother-and-child questionnaire for mothers with a child between the ages of 2 and 3 years. For further information, please see the biographical data documentation.

New name for the dataset previously known as BIOYOUTH (based on a survey of adolescents between 16 and 17 years old).

Weighting 2005

The 2005 cross-sectional weights are provisional - an update of VPHRF and VHHRF will be released in fall 2006

The wave-specific projection and weighting variables will be adjusted annually to external official data to ensure the accuracy of marginal distributions on age, sex, household size and nationality. The source of the data is the German Federal Statistical Office's official microcensus. From 2005 on, the data on Berlin will no longer be reported separately for the areas comprising former West Berlin / East Berlin; rather, Berlin will be considered part of East Germany. As a consequence, the data required to adjust our weights to the official marginal distributions will not be available before fall 2006.

To prevent this from causing a delay in the distribution of the SOEP data up to Wave V (2005), the weights (VPHRF* and VHHRF*) have been adjusted to the data used for Wave U (2004).

From our experience, there is a very low deviation in the benchmark data over the years (the new definition for West Berlin / East Berlin being one exception). Please keep in mind the provisional nature of the weighting scheme, and indicate this explicitly in any publications using the weights for Wave V. We will inform you as soon as the final version, based on the 2005 microcensus data, becomes available via the SOEP NEWSLETTER and listserver.

$HGEN 2005 

The adjusted screener (AHINC$$) is now available for all waves (Exception: Sample C in 1990/1991).  

$PGEN 2005 

ALLBET$$ (new)
Raw categories for the size of the company. A consistent variable over all waves for the size of the company ("least common denominator" of the variable BETR$$).


  1.  "less than 20"
  2. "20 to 200"
  3. "200 to 2000"
  4. "2000 and above"
  5. "Self-employed with no other employees"

BETR$$ (revised):

The variable BETR$$ now has eleven instead of nine categories. The reason is the more detailed questions from Wave V onwards. The old category "5 to 20 employees" is now split into two categories ("5 to 10 employees" and "11 to 20 employees").

The new categories are:

  1. "less than 5"
  2. "GE 5 LE 10"
  3. "11 LT 20"
  4. "up to 1990: LT 20"
  5. "1991-2004: 5 LT 20"
  6. "GE 20 LT 100"
  7. "GE 100 LT 200"
  8. "up to 1998: GE 20 LT 200"
  9. "GE 200 LT 2000"
  10. "GE 2000"
  11. "Self-employed without employees"

TIP: The variable ALLBET$$ in the dataset $PGEN offers consistent data on company size thoughout all waves of the SOEP, although with fewer categories in a less detailed classification.

EMPLST$$ (new):
Employment Status. A consistent variable over all waves to differentiate employment status (in addition to the variable LFS$$, which differentiates non-employed persons).


  1. "Full-time employment"
  2. "Regular part-time employment"
  3. "Vocational training"
  4. "Marginal, irregular part-time employment"
  5. "Not employed"

EXPFT$$ (new):
Working experience full-time employment. Coverage of complete working experience in full-time employment (in years, one digit after the decimal point).

EXPPT$$ (new):
Working experience part-time employment. Coverage of complete working experience in part-time employment (in years, one digit after the decimal point).

EXPUE$$ (new):
Unemployment experience. Coverage of unemployment experience throughout the entire period of working life (in years, one digit after the decimal point).

Contact: Silke Anger

$PEQUIV 2005  

SSOLD$$ (new):
Social assistance to the elderly ("Grundsicherung im Alter").

FSSOLD$$ (new):
Imputation flag: Social assistance to the elderly.

LOSSR$$ (new):
Losses from renting and leasing.

FLOSSR$$ (new):
Imputation flag: losses from renting and leasing.

LOSSC$$ (new):
Losses from capital investment.

FLOSSC$$ (new):
Imputation flag: losses from capital investment.

D11112LL (new):
Race of individual

D11110$$ (erased):
data already included in the variables M11124$$.

D11111$$ (erased):
data already included in the variables M11125$$ .

Contact: Markus Grabka

Bug fixes  

Correction of [T-U]HPOP in HPFAD.
Correction of some individual and household weights for the years 2003 and 2004 (THHRF, UPHRF, and UHHRF).