Skip to content!

SOEP-Core v29 - Changes in the Dataset

Änderungen am Datensatz

Dataset Information

The new data distribution (1984-2012) "SOEP v29" provides, for the most recent survey year 2012, the usual wave-specific data files BCPBRUTTO, BCP, BCPKAL, BCPGEN, BCPAGE17, BCHBRUTTO, BCH, BCHGEN, BCKIND, and BBPLUECKE as well as the updated files with a longitudinal component (PFAD files, biography files, spell data, and weighting factors).

1. New subsample K

In 2012, we added a new refreshment sample with 1,526 new households (Sample K). In total, 12,322 households were interviewed as part of the 2012 fieldwork. As with previous general population samples, the refreshment sample K was  realized by using a multi-stage stratified sampling design. Refreshment sample K resulted in a very similar response rate of 34.7 % compared to our last Refreshment Sample J. Thus, the general downward trend in participation was successfully stopped through a range of measures including centralized face-to-face interviewer training, better pay for interviewers, and more attractive incentives for respondents.

In the current refreshment samples, fieldwork is conducted exclusively by CAPI, as it was with the previous refreshments H (2006), I (2009), and J (2011). Similarly to our other refreshment samples, data collection is focused on three main questionnaires: the household, the individual, and the youth questionnaire. Thus, no supplementary questionnaires were used with respondents in wave 1. The reason for focusing on the key questionnaires is to avoid "overburdening" respondents with a lengthy wave 1 interview.

2. Revision of the weighting and estimation procedure

In version SOEP v29 of the SOEP data, the data from subsamples J and K (first collected in 2011 and 2012, respectively) have been adjusted to the German Microcensus for the number of employed people in households of different sizes as well as for the number of private households receiving Unemployment Benefit II (ALG II). This correction prevents an overestimation of households receiving ALG II in the unweighted samples J and K.

Also, for all newly drawn samples since 1998, a minor adjustment has been made to the definition of households containing foreign nationals. The criterion is no longer the household head but the presence of at least one person of foreign nationality in the household. The revision was made due to a slightly increasing discrepancy between the reference person chosen in the German Microcensus and the household head in the SOEP.

3. New datasets / variables

  • In 2012, the SOEP replicated its wealth module for the fourth time after 1988, 2002, and 2007. Due to the higher response burden in first-wave respondents, we did not survey wealth in the most recent refreshment sample K (N=1506 households). For the estimation of totals, we therefore recommend to use the cross-sectional household and person weights covering "old" samples A through J only and excluding wave 1 units emanating from Sample K, i.e. BCHRFAJ and BCPHRFAJ.
  • COGNIT: For the short cognitive tests implemented in the survey year 2006 we can now provide the first repeat, including an additional word knowledge test. The name of the dataset changed from COGNIT06 to COGNIT, because both survey years are now included in long format. A detailed documentation of the first test can be found in Schupp et al. (2008) Erfassung kognitiver Leistungspotentiale Erwachsener im Sozio-oekonomischen Panel (SOEP), DIW Berlin, Data Documentation 32.
  • Two new variables in $PGEN: The variable SNDJOB$$ represents the imputed current gross labor income from a second job, generated for all SOEP respondents who are employed in each respective wave. Information about gross income from the second job was first asked in 1995 (wave L). The respective imputation flag is the variable IMPSND$$.
  • For the first time, respondents were asked their place of birth. This information including the coordinates of the respective municipality is available at our guest workstations at the Research Data Center SOEP.
  • A new dataset HCONSUM with generated data from the consumption module used in the SOEP in the year 2010. A detailed documentation is available online.


4. Improvements and Bug Fixes

  • Revision of the $STELL codes (Relationship to the head of household) to differentiate between biological child and stepchild, adoptive child etc. :
Value  Wave BB (2011)             Wave BC (2012)                                        
   0  Head Of Household          Head Of Household                                     
   1  Spouse Of HH Head                                                                
   2  Life Partner                                                                     
   3  Son, Daughter                                                                    
   4  Foster Child                                                                     
   5  Son, Daughter-In-Law                                                             
   6  Father, Mother                                                                   
   7  Parent-In-Law                                                                    
   8  Brother, Sister,-In Law                                                          
   9  Grandchild                                                                       
  10  Other Relative                                                                   
  11  Non-Relative               Spouse Of HH Head                                     
  12  Child of HH-Heads Partner  Same-Sex Spouse                                       
  13  Same-Sex Spouse            Life Partner                                          
  21                              Son, Daughter                                         
  22                             Stepchild (Child of the Partner)                      
  23                             Adoptive Child                                        
  24                             Foster Child                                          
  25                             Grandchild                                            
  26                              Great-Grandchild                                      
  27                             Son, Daughter-In-Law                                  
  31                             Father, Mother                                        
  32                             Step Father / Step Mother / Spouse of Father or Mother
  33                             Adoptive Father or Mother                             
  34                              Foster Father or Mother                               
  35                             Parent-In-Law                                         
  36                             Grandparents                                          
  41                             Brother, Sister                                       
  42                             Half-Brother, Half-sister                             
  43                              Stepbrother, Stepsister                               
  44                             Adoptive Brother/Sister                               
  45                             Foster Brother/Sister                                 
  51                             Brother, Sister -in Law (spouse of brother/sister)    
  52                             Brother, Sister -in Law (brother/sister of spouse)    
  61                              Aunt, Uncle                                           
  62                             Niece/ Nephew                                         
  63                             Cousin/Cousine                                        
  64                             Other Relative                                        
  71                             Others                                                
  99  Unknown                     Unknown                                               

Please note that this also affects the corresponding variables in the dataset $KIND (and KIDLONG) and BIOPAREN.

  • We do no longer have additional variables on birth date (GEBJAHR and GEBMONAT) and sex (SEX) in our dataset KIDLONG, please use instead the more intensively checked versions in PPFAD.
  • Last year, we already provided the interviewer data with a new variable, INTID, which is unified across all waves and takes the place of the respective file-specific variables ($INTNR). The new variable was determined through one-time generation of a random number; it is therefore fixed and remains consistent in an integrated master file (not contained in the data distribution) for SOEPcore as well as for FiD (Families in Germany) and SOEP-IS (innovation panel). In addition to generating the INTIDs and updating the interviewer characteristics in INTVIEW, we have also made the following revisions:
    • The dataset INTVIEW no longer contains just the interviewer with interviewer characteristics but also all available interviewer numbers. To provide this information, we extracted all interviewer numbers from all available datasets. Flag variables in INTVIEW show whether other interviewer characteristics are available for this particular INTID or not.
    • A total of 181 INTIDs were newly assigned in the updated data, allowing these to be directly linked with the respective interviewer characteristics. This is due to the assignment of numbers by Infratest in East Germany from 1990 to 1995, when there were still some independent interviewers (IBB-numbers) for the East sample whose numbers were assigned according to a different system. These had to be harmonized with the interviewer numbers that were merged later.
  • BIOAGE03: the codes for personality was changed from 1-11 to 0-10 and is now consistent with the codes for personality in bioage06.
  • BIOAGE06: in 2008, for personality, the value zero was mistakenly coded -2. This mistake was corrected. This resulted in up to 65 additional valid cases for some traits in the survey year 2008.
  • $FAMSTD: In generating current marital status, current and previous year were switched for some cases in 2011 in v28.
  • In 2012, the questionnaire provides one-time-only information on the size of the local establishment in addition to the size of the entire company (BETR$$). The enriched questionnaire revealed that in previous interviews, some individuals mistakenly provided information on the local establishment size instead of the entire company size, especially if their entire company had 2000 or more employees. Due to the importance of longitudinal consistency, these persons were identified, and their 2012 original value of the entire company size BETR12 was replaced by their value of the local establishment size. These modifications also affected the variable ALLBET12. Please see the data documentation for further details.
  • The variable RUEBSTD ("overtime hours during last month" in 2001) had cases with incorrect non-response missings (-1), since respondents without overtime mistakenly were assigned to this category. In the corrected version, the value for these respondents is correctly coded as zero overtime hours.
  • With the variable vh4601 and the equivalent variables in the following years, the label "contributions over 2,500 euros" was used, but actually the questionnaire asked for "contributions over 500 euros". The label was corrected.
  • The variables ZERWZEIT and BAERWZEIT ("length of time with firm" in 2009 and 2010) had to be corrected for respondents in sample I who did not have their wave 2009 interview and wave 2010 interview in the respective year but at the beginning of the following year (2010 and 2011). Due to the longitudinal consistency check, these individuals mistakenly received an implausible value (-3) for BAERWZEIT. In the corrected version, the non-missing values of these respondents are considered to be valid and not set to missing.
  • LOC1989: In generating the data, persons are now included who never participated. As a result, the -2 means "does not apply, born before 1989" as planned for this variable. Respondents who have never participated and who were unable to gather information from other sources were set to -1 ("no answer").
  • The variables EXPFT$$, EXPPT$$, and EXPUE$$ (experience in full-time employment, part-time employment, and unemployment) have been improved. The variables reflect now the total length of full-time/part-time/unemployment in the respondent's career up to the point of the interview in a given year (instead of only up to December of the previous year). Since monthly employment activities are asked retrospectively in the following year, the variables cannot be updated for the most current wave.
  • The variable AHINC$$ in dataset $HGEN is no longer part of the data distribution, we recommend to use the completely (multiple) imputed monthly net household income from variables I$HINC$$ (or dataset MIHINC in long format over all years).
  • The variables ATATZEIT, AVEBZEIT, AUEBSTD and AERWZEIT were mixed up in the data distribution v28 and had to be corrected:
    • The correct values of ATATZEIT were found in the variable AERWZEIT.
    • The correct values of AVEBZEIT were found in the variable ATATZEIT.
    • The correct values of AUEBSTD were found in the variable AVEBZEIT.
    • The correct values of AERWZEIT were found in the variable AERWZEIT of the data distribution v27.