SOEP-Core v29 (data 1984-2012)

The German Socio-Economic Panel (SOEP) study is a wide-ranging representative longitudinal study of private households, located at the German Institute for Economic Research, DIW Berlin. Every year, there were nearly 11,000 households, and more than 20,000 persons sampled by the fieldwork organization TNS Infratest Sozialforschung. The data provide information on all household members, consisting of Germans living in the Old and New German States, foreigners, and recent immigrants to Germany. The Panel was started in 1984. Some of the many topics include household composition, occupational biographies, employment, earnings, health and satisfaction indicators. As early as June 1990—even before the Economic, Social and Monetary Union—SOEP expanded to include the states of the former German Democratic Republic (GDR), thus seizing the rare opportunity to observe the transformation of an entire society. An immigrant sample was added as well to account for the changes that took place in German society in 1994/95. Further new samples were added in 1998, 2000, 2002, 2006, 2009, 2011, and 2012. The survey is constantly being adapted and developed in response to current social developments. The international version contains 95% of all cases surveyed.

Dataset Information

Titel: German Socio-Economic Panel (SOEP), data of the years 1984-2012

DOI: 10.5684/soep.v29
Collection period: 1984-2012
Publication date: Nov. 06, 2013
Principal investigators
: Jürgen Schupp, Martin Kroh, Jan Goebel, Simone Bartsch, Marco Giesselmann, Markus Grabka, Peter Krause, Elisabeth Liebau, David Richter, Christian Schmitt, Daniel Schnitzlein, Frauke Peter, Ingrid Tucci

Data collector: TNS Infratest Sozialforschung GmbH.

Population: Persons living in private households in Germany.

Selection method: All samples of SOEP are multi-stage random samples which are regionally clustered. The respondents (households) are selected by random-walk.

Collection mode: The interview methodology of the SOEP is based on a set of pre-tested questionnaires for households and individuals. Principally an interviewer tries to obtain face-to-face interviews with all members of a given survey household aged 16 years and over. Additionally one person (head of household) is asked to answer a household related questionnaire covering information on housing, housing costs, and different sources of income. This covers also some questions on children in the household up to 16 years of age, mainly concerning attendance at institutions (kindergarten, elementary school, etc.).

 Data set information:

 Number of units 77.934
 Number of variables 50.231 in 376 data sets
 Data format STATA, SPSS, SAS, CSV
 MD5 fingerprints of the data sets

  • Jan Goebel, Markus M. Grabka, Stefan Liebig, Martin Kroh, David Richter, Carsten Schröder, Jürgen Schupp. 2018. The German Socio-Economic Panel Study (SOEP). Jahrbücher für Nationalökonomie und Statistik / Journal of Economics and Statistics (online first), doi: 10.1515/jbnst-2018-0022
  • Gert G. Wagner, Jan Göbel, Peter Krause, Rainer Pischner, and Ingo Sieber (2008) Das Sozio-oekonomische Panel (SOEP): Multidisziplinäres Haushaltspanel und Kohortenstudie für Deutschland - Eine Einführung (für neue Datennutzer) mit einem Ausblick (für erfahrene Anwender), AStA Wirtschafts- und Sozialstatistisches Archiv 2 (4), 301-328 (download)
  • Schupp, Jürgen (2009): 25 Jahre Sozio-oekonomisches Panel - Ein Infrastrukturprojekt der empirischen Sozial- und Wirtschaftsforschung in Deutschland, Zeitschrift für Soziologie 38 (5), pp. 350-357.

Publications using this file should refer to the above DOI infoFind an explanation on the usage of DOI here.and cite one of the following references

  • Goebel, Jan, Markus M. Grabka, Stefan Liebig, Martin Kroh, David Richter, Carsten Schröder, and Jürgen Schupp. 2019. The German Socio-Economic Panel (SOEP). Jahrbücher für Nationalökonomie und Statistik (Journal of Economics and Statistics) 239 (2), 345-360. (
  • Schröder, Carsten, Johannes König, Alexandra Fedorets, Jan Goebel, Markus M. Grabka, Holger Lüthen, Maria Metzing, Felicitas Schikora, and Stefan Liebig. 2020. The economic research potentials of the German Socio-Economic Panel study. German Economic Review 21 (3), 335-371. (
  • Giesselmann, Marco, Sandra Bohmann, Jan Goebel, Peter Krause, Elisabeth Liebau, David Richter, Diana Schacht, Carsten Schröder, Jürgen Schupp, and Stefan Liebig. 2019. The Individual in Context(s): Research Potentials of the Socio-Economic Panel Study (SOEP) in Sociology. European Sociological Review 35 (5), 738-755. (

The new data distribution (1984-2012) "SOEP v29" provides, for the most recent survey year 2012, the usual wave-specific data files BCPBRUTTO, BCP, BCPKAL, BCPGEN, BCPAGE17, BCHBRUTTO, BCH, BCHGEN, BCKIND, and BBPLUECKE as well as the updated files with a longitudinal component (PFAD files, biography files, spell data, and weighting factors).

1. New subsample K

In 2012, we added a new refreshment sample with 1,526 new households (Sample K). In total, 12,322 households were interviewed as part of the 2012 fieldwork. As with previous general population samples, the refreshment sample K was  realized by using a multi-stage stratified sampling design. Refreshment sample K resulted in a very similar response rate of 34.7 % compared to our last Refreshment Sample J. Thus, the general downward trend in participation was successfully stopped through a range of measures including centralized face-to-face interviewer training, better pay for interviewers, and more attractive incentives for respondents.

In the current refreshment samples, fieldwork is conducted exclusively by CAPI, as it was with the previous refreshments H (2006), I (2009), and J (2011). Similarly to our other refreshment samples, data collection is focused on three main questionnaires: the household, the individual, and the youth questionnaire. Thus, no supplementary questionnaires were used with respondents in wave 1. The reason for focusing on the key questionnaires is to avoid "overburdening" respondents with a lengthy wave 1 interview.

2. Revision of the weighting and estimation procedure

In version SOEP v29 of the SOEP data, the data from subsamples J and K (first collected in 2011 and 2012, respectively) have been adjusted to the German Microcensus for the number of employed people in households of different sizes as well as for the number of private households receiving Unemployment Benefit II (ALG II). This correction prevents an overestimation of households receiving ALG II in the unweighted samples J and K.

Also, for all newly drawn samples since 1998, a minor adjustment has been made to the definition of households containing foreign nationals. The criterion is no longer the household head but the presence of at least one person of foreign nationality in the household. The revision was made due to a slightly increasing discrepancy between the reference person chosen in the German Microcensus and the household head in the SOEP.

3. New datasets / variables

  • In 2012, the SOEP replicated its wealth module for the fourth time after 1988, 2002, and 2007. Due to the higher response burden in first-wave respondents, we did not survey wealth in the most recent refreshment sample K (N=1506 households). For the estimation of totals, we therefore recommend to use the cross-sectional household and person weights covering "old" samples A through J only and excluding wave 1 units emanating from Sample K, i.e. BCHRFAJ and BCPHRFAJ.
  • COGNIT: For the short cognitive tests implemented in the survey year 2006 we can now provide the first repeat, including an additional word knowledge test. The name of the dataset changed from COGNIT06 to COGNIT, because both survey years are now included in long format. A detailed documentation of the first test can be found in Schupp et al. (2008) Erfassung kognitiver Leistungspotentiale Erwachsener im Sozio-oekonomischen Panel (SOEP), DIW Berlin, Data Documentation 32 (PDF, 447.63 KB).
  • Two new variables in $PGEN: The variable SNDJOB$$ represents the imputed current gross labor income from a second job, generated for all SOEP respondents who are employed in each respective wave. Information about gross income from the second job was first asked in 1995 (wave L). The respective imputation flag is the variable IMPSND$$.
  • For the first time, respondents were asked their place of birth. This information including the coordinates of the respective municipality is available at our guest workstations at the Research Data Center SOEP.
  • A new dataset HCONSUM with generated data from the consumption module used in the SOEP in the year 2010. A detailed documentation (PDF, 1.5 MB) is available online.


4. Improvements and Bug Fixes

  • Revision of the $STELL codes (Relationship to the head of household) to differentiate between biological child and stepchild, adoptive child etc. :
Value  Wave BB (2011)             Wave BC (2012)                                        
   0  Head Of Household          Head Of Household                                     
   1  Spouse Of HH Head                                                                
   2  Life Partner                                                                     
   3  Son, Daughter                                                                    
   4  Foster Child                                                                     
   5  Son, Daughter-In-Law                                                             
   6  Father, Mother                                                                   
   7  Parent-In-Law                                                                    
   8  Brother, Sister,-In Law                                                          
   9  Grandchild                                                                       
  10  Other Relative                                                                   
  11  Non-Relative               Spouse Of HH Head                                     
  12  Child of HH-Heads Partner  Same-Sex Spouse                                       
  13  Same-Sex Spouse            Life Partner                                          
  21                              Son, Daughter                                         
  22                             Stepchild (Child of the Partner)                      
  23                             Adoptive Child                                        
  24                             Foster Child                                          
  25                             Grandchild                                            
  26                              Great-Grandchild                                      
  27                             Son, Daughter-In-Law                                  
  31                             Father, Mother                                        
  32                             Step Father / Step Mother / Spouse of Father or Mother
  33                             Adoptive Father or Mother                             
  34                              Foster Father or Mother                               
  35                             Parent-In-Law                                         
  36                             Grandparents                                          
  41                             Brother, Sister                                       
  42                             Half-Brother, Half-sister                             
  43                              Stepbrother, Stepsister                               
  44                             Adoptive Brother/Sister                               
  45                             Foster Brother/Sister                                 
  51                             Brother, Sister -in Law (spouse of brother/sister)    
  52                             Brother, Sister -in Law (brother/sister of spouse)    
  61                              Aunt, Uncle                                           
  62                             Niece/ Nephew                                         
  63                             Cousin/Cousine                                        
  64                             Other Relative                                        
  71                             Others                                                
  99  Unknown                     Unknown                                               

Please note that this also affects the corresponding variables in the dataset $KIND (and KIDLONG) and BIOPAREN.

  • We do no longer have additional variables on birth date (GEBJAHR and GEBMONAT) and sex (SEX) in our dataset KIDLONG, please use instead the more intensively checked versions in PPFAD.
  • Last year, we already provided the interviewer data with a new variable, INTID, which is unified across all waves and takes the place of the respective file-specific variables ($INTNR). The new variable was determined through one-time generation of a random number; it is therefore fixed and remains consistent in an integrated master file (not contained in the data distribution) for SOEPcore as well as for FiD (Families in Germany) and SOEP-IS (innovation panel). In addition to generating the INTIDs and updating the interviewer characteristics in INTVIEW, we have also made the following revisions:
    • The dataset INTVIEW no longer contains just the interviewer with interviewer characteristics but also all available interviewer numbers. To provide this information, we extracted all interviewer numbers from all available datasets. Flag variables in INTVIEW show whether other interviewer characteristics are available for this particular INTID or not.
    • A total of 181 INTIDs were newly assigned in the updated data, allowing these to be directly linked with the respective interviewer characteristics. This is due to the assignment of numbers by Infratest in East Germany from 1990 to 1995, when there were still some independent interviewers (IBB-numbers) for the East sample whose numbers were assigned according to a different system. These had to be harmonized with the interviewer numbers that were merged later.
  • BIOAGE03: the codes for personality was changed from 1-11 to 0-10 and is now consistent with the codes for personality in bioage06.
  • BIOAGE06: in 2008, for personality, the value zero was mistakenly coded -2. This mistake was corrected. This resulted in up to 65 additional valid cases for some traits in the survey year 2008.
  • $FAMSTD: In generating current marital status, current and previous year were switched for some cases in 2011 in v28.
  • In 2012, the questionnaire provides one-time-only information on the size of the local establishment in addition to the size of the entire company (BETR$$). The enriched questionnaire revealed that in previous interviews, some individuals mistakenly provided information on the local establishment size instead of the entire company size, especially if their entire company had 2000 or more employees. Due to the importance of longitudinal consistency, these persons were identified, and their 2012 original value of the entire company size BETR12 was replaced by their value of the local establishment size. These modifications also affected the variable ALLBET12. Please see the data documentation for further details.
  • The variable RUEBSTD ("overtime hours during last month" in 2001) had cases with incorrect non-response missings (-1), since respondents without overtime mistakenly were assigned to this category. In the corrected version, the value for these respondents is correctly coded as zero overtime hours.
  • With the variable vh4601 and the equivalent variables in the following years, the label "contributions over 2,500 euros" was used, but actually the questionnaire asked for "contributions over 500 euros". The label was corrected.
  • The variables ZERWZEIT and BAERWZEIT ("length of time with firm" in 2009 and 2010) had to be corrected for respondents in sample I who did not have their wave 2009 interview and wave 2010 interview in the respective year but at the beginning of the following year (2010 and 2011). Due to the longitudinal consistency check, these individuals mistakenly received an implausible value (-3) for BAERWZEIT. In the corrected version, the non-missing values of these respondents are considered to be valid and not set to missing.
  • LOC1989: In generating the data, persons are now included who never participated. As a result, the -2 means "does not apply, born before 1989" as planned for this variable. Respondents who have never participated and who were unable to gather information from other sources were set to -1 ("no answer").
  • The variables EXPFT$$, EXPPT$$, and EXPUE$$ (experience in full-time employment, part-time employment, and unemployment) have been improved. The variables reflect now the total length of full-time/part-time/unemployment in the respondent's career up to the point of the interview in a given year (instead of only up to December of the previous year). Since monthly employment activities are asked retrospectively in the following year, the variables cannot be updated for the most current wave.
  • The variable AHINC$$ in dataset $HGEN is no longer part of the data distribution, we recommend to use the completely (multiple) imputed monthly net household income from variables I$HINC$$ (or dataset MIHINC in long format over all years).
  • The variables ATATZEIT, AVEBZEIT, AUEBSTD and AERWZEIT were mixed up in the data distribution v28 and had to be corrected:
    • The correct values of ATATZEIT were found in the variable AERWZEIT.
    • The correct values of AVEBZEIT were found in the variable ATATZEIT.
    • The correct values of AUEBSTD were found in the variable AVEBZEIT.
    • The correct values of AERWZEIT were found in the variable AERWZEIT of the data distribution v27.

1984-2012 (Wave BC)

Mar. 27,2014


Errors in the imputation of electricity, heating, and additional expenses for tenants in the current data distribution resulted in values that were too high. These errors also affected the generation of rent including maintenance but excluding heating. The variables affected are: electr$$, heat$$, util$$, rent$$, and frent$$ for the years 2008 to 2012. The variables typ1hh12 and typ2hh12 changed for two households.


Also in the 2012 survey year, after the suspension of compulsory military service in Germany, the related calendar information in the individual questionnaire was revised. This revision was made in the original individual data for 2012 but not in the corresponding calendar data—these have now been updated retrospectively for the data distribution v29.

Both errors were corrected and an update is now available for downloading upon request ( If you would like to use this updated version in your work, please cite the version number, SOEP v29.1 (or better, doi: 10.5684/soep.v29.1) in publications using these data.

Please find all sample specific questionnaires of this year and all questionnaires of previous years on this site

