SOEP-Core v24 (data 1984-2007)

The German Socio-Economic Panel Study (SOEP) is a wide-ranging representative longitudinal study of private households, located at the German Institute for Economic Research, DIW Berlin. Every year, there were nearly 11,000 households, and more than 20,000 persons sampled by the fieldwork organization TNS Infratest Sozialforschung. The data provide information on all household members, consisting of Germans living in the Old and New German States, Foreigners, and recent Immigrants to Germany. The Panel was started in 1984. Some of the many topics include household composition, occupational biographies, employment, earnings, health and satisfaction indicators.
As early as June 1990—even before the Economic, Social and Monetary Union—SOEP expanded to include the states of the former German Democratic Republic (GDR), thus seizing the rare opportunity to observe the transformation of an entire society. An immigrant sample was added as well to account for the changes that took place in Germany society in 1994/95. Further new samples were added in 1998, 2000, 2002, and 2006. The survey is constantly being adapted and developed in response to current social developments.

Dataset Information

Titel: Sozio-oekonomisches Panel (SOEP), Daten der Jahre 1984 – 2007

DOI: 10.5684/soep.v24
Erhebungszeitraum: 1984–2007
Veröffentlichungsdatum: 07.07.2008
: Gert. G. Wagner, Joachim R. Frick, Jürgen Schupp, Silke Anger, Jan Goebel, Markus M. Grabka, Olaf Groh-Samberg, Elke Holst, Peter Krause, Martin Kroh, Henning Lohmann, Rainer Pischner, Christian Schmitt, C. Katharina Spieß, Martin Spieß

Datenerhebung: TNS Infratest Sozialforschung GmbH

Population: Personen in Privathaushalten in der Bundesrepulik Deutschland

Auswahlverfahren: Alle Samples des SOEP werden mittels mehrstufiger Stichprobenziehung, die regional gebündelt sind, gezogen. Die Befragten (Haushalte) werden per random-walk ausgesucht.

Erhebungsverfahren: Die Methode der Datenerhebung des SOEP basiert auf einem Set von Fragebögen sowohl für die Haushalte als auch für die Individuen. Prinzipiell versucht ein Interviewer face-to-face-Interviews mit allen Haushaltsmitgliedern durchzuführen, die 16 Jahre oder älter sind. Zusätzlich wird eine Person (Haushaltsvorstand) gebeten, einen Haushaltsfragebogen zu beantworten, einschließlich Fragen zur Wohnsituation, Kosten, verschiedenen Einkommensquellen, sowie Fragen zu im Haushalt lebenden Kindern unter 16 Jahren (z.B. Besuch des Kindergartens, der Grundschule etc.).

Dataset information:

Number of units 61.544
Number of Variables 39.550 in 297 Datensätzen
Data formats STATA, SPSS, SAS, CSV


  • Gert G. Wagner; Richard V. Burkhauser, and Frederike Behringer. 1993. The English Language Public Use File of the German Socio-Economic Panel Study, The Journal of Human Resources 28 (2), 429-433.
  • Jan Goebel, Markus M. Grabka, Stefan Liebig, Martin Kroh, David Richter, Carsten Schröder, Jürgen Schupp. 2018. The German Socio-Economic Panel Study (SOEP). Jahrbücher für Nationalökonomie und Statistik / Journal of Economics and Statistics (online first), doi: 10.1515/jbnst-2018-0022

Publications using this file should refer to the above DOI infoFind an explanation on the usage of DOI here.and cite one of the following references

  • Goebel, Jan, Markus M. Grabka, Stefan Liebig, Martin Kroh, David Richter, Carsten Schröder, and Jürgen Schupp. 2019. The German Socio-Economic Panel (SOEP). Jahrbücher für Nationalökonomie und Statistik (Journal of Economics and Statistics) 239 (2), 345-360. (
  • Schröder, Carsten, Johannes König, Alexandra Fedorets, Jan Goebel, Markus M. Grabka, Holger Lüthen, Maria Metzing, Felicitas Schikora, and Stefan Liebig. 2020. The economic research potentials of the German Socio-Economic Panel study. German Economic Review 21 (3), 335-371. (
  • Giesselmann, Marco, Sandra Bohmann, Jan Goebel, Peter Krause, Elisabeth Liebau, David Richter, Diana Schacht, Carsten Schröder, Jürgen Schupp, and Stefan Liebig. 2019. The Individual in Context(s): Research Potentials of the Socio-Economic Panel Study (SOEP) in Sociology. European Sociological Review 35 (5), 738-755. (

The 2008 data distribution (1984-2007) provides, for the year 2007, the usual wave-specific data XPBRUTTO, XP, XPKAL, XPGEN, XHBRUTTO, XH, XHGEN, XKIND and WPLUECKE as well as the updated files with a longitudinal component (PFAD files, biography files, spell data and weighting factors).

In the survey year 2006, a representative supplementary sample for all of Germany was added: refreshment sample H. Biographical background information has been collected from respondents in sample H for the first time in 2007. This data has been fully integrated into alle relevant biography files (BIOxxxx).

As part of the SOEP innovations projects TNS Infratest Sozialforschung conducted in December 2006 a postal survey among former SOEP panel members from households which had been classified as final refusals in 2001-2004. As a byproduct we could change the information on year of birth from missing to a valid value for 21 of these persons (more information can be found in the executive summary (PDF, 36.18 KB)executive summary of the TNS Infratest Methodenbericht).

Furthermore the following additions and modifications have been made:

A. New and Renamed Datasets

In the 2006 survey year, for the first time, short cognitive tests were carried out with a subsample of the SOEP. The goal was to employ a robust set of instruments that could be administered easily by trained interviewers in just a few minutes. Close to 80% of all persons chosen for participation in the cognitive test provided valid answers. Thus, for the first time, the SOEP now contains indicators of cognitive potentials for more than 5,500 persons, along with diverse educational information based on degrees and certifications. It is planned that the first repeat of the test will take place in the 2010 survey year. A detailed documentation and selection analyses can be found in Schupp et al. (2008) Erfassung kognitiver Leistungspotentiale Erwachsener im Sozio-oekonomischen Panel (SOEP), DIW Berlin, Data Documentation 32 (PDF, 447.63 KB).

These two datasets replace the former dataset YPBRUTTO, however this year both variants are available 

Multiple imputed dataset on monthly net household income for the years 1996 to 2007. The dataset is stored in long format (long format: hhnrakt, svyyear, mj, also called mim format within stata). Each item non-response on net household income was imputed 10 times. More information can be found in HGEN.pdf (PDF, 0.64 MB)

B. New Variables 

B.1 Dataset XPBRUTTO

  • XEWSTATU: Proxy information on non-responding persons regarding their labor force status in households with partial unit non-response.


B.2 Dataset $PEQUIV

  • P11101$$: Copy of the wave specific variables on overall life satisfaction.

B.3 Dataset $HGEN

  • I_HINC$$: Multiple imputed version of HINC$$, the monthly net household income. Imputations 1-5 are available as wide format in $HGEN (only 1996-2007), all generated imputations (10) are available in an extra dataset called MIHINC in long format, additional information can be found in HGEN.pdf ( document,169 KB).
  • FHINC$$: Imputation flag for I_HINC$$, 0 means not imputed and 1 otherwise.

C. Revised Variables

C.1 In the Dataset $PKAL

  • $P2D03 + $P2E03: In the waves U-W (years 2004-2006) for some cases an incorrect "does not apply" missing (-2) was corrected to an "no answer" missing (-1).

C.2 In the Dataset HHRF/PHRF

  • WPHRF*: All weighting factor for the year 2006 are now based on microcensus benchmark data from 2006.

    However, the weighting factors for the year 2007are also based on (newest available) microcensus benchmark data from 2006; they are therefore only provisional with regard to the figures given for households and individuals in Germany.

  • VHHRF + VHHRF1: 1 Household from sample G was corrected and set to 0.


C.3 In the Dataset $PGEN

  • LFS$$: The variable „labor force status" has been improved over all waves with respect to the accuracy of classifying individuals as „non-working and older than 65"(category 2). Now, the information on the month of birth of a person is used in order to determine whether the person was older than 65 at the time of the interview.


D. Error Updates

D.1 In the Dataset VH and WH

  • We have corrected value labels for the variables indicating the owner of the dwelling (VH27 and WH27), please note the relevant corrections in the table below.

    Variable Label: Owner Of The Dwelling

    -2 Does not apply Does not apply
    -1 No answer No answer
    1 Self Owned Res. Property Local Govt. Apt.
    2 Local Govt. Apt. Co-Operative Apt.
    3 Co-Operative Apt. Company Apt.
    4 Company Apt. Private Owner
    5 Private Owner Do Not Know

D.2 In the dataset $PGEN

  • EGP$$: The variable "Erikson and Goldthorpe Class Category" (international socio-economic index of occupational status) has been corrected with respect to the assignment of individuals to category (18) "not working - pensioner". Up to now, all pension recipients, i.e. recipients of retirement pension and recipients of widow's/orphan's pension have been erroneously classified as "not working - pensioner" if none of the other categories applied. In the corrected generation of the EGP$$ variable, which applies to all waves, non-working persons are only assigned to this category if they are recipients of a retirement pension or if they are recipients of orphan's/widow's pension AND are older than 60 years. Moreover, if there is missing information on pension receipt, additional information from ARTKALEN (retrospective information from the activity calendar for the previous year) is used in the generation process to determine if a person was in retirement or early retirement ("Vorruhestand") at the time of the interview. All other non-working persons are assigned to category (-2) "does not apply" as long as they are not registered as unemployed (category 15).
  • STIB$$: The same problem of misclassification of individuals to the category "pensioner" (13) applied to the variable for the "Occupational position", and has been corrected for all waves in the same way as for EGP$$ .
  • NACE$$: The variable for the "two-digit NACE Industry - Sector" had several inconsistencies with respect to the labeling. In particular, the labels for code (90) "Sewage And Refuse Disposal, Sanitation And Related" and code (95) "Private Households With Employed Persons" had to be swapped. Some other labels were not accurate, and have been stated more precisely for all waves.
  • IS88$$, ISEI$$, MPS$$, SIOPS$$, KLAS$$, EGP$$: The questions which refer to these variables are not asked from all employed persons annually. In the survey years 1985, 1986, 1987, 1988, 1990 (West), 1992 (West), 1994, 1996, 1999, 2001, 2003, 2005, and 2006 only those employed persons who changed jobs and first-time respondents are asked to provide up-to-date information. Hence, in years with a partial survey, these variables should contain the available previous year's information for all employed persons without a job change who did not update the information on their current occupation. However, for some individuals, the previous year's data was not used by mistake. This mistake was corrected by newly generating these variables for all the waves in an accurate and consistent way.

Dec. 04, 2008 In the process of extensive checking, several problems were identified in the 1984-2007 data distribution currently available on DVD (waves A-X).

The corrected datasets are now available to be downloaded as a password-protected ZIP file from our homepage. To obtain download access to the corrected datasets, please send an e-mail to or call the SOEPhotline at +49 30 89789 292.

To unzip the files, you will need the password for the current 1984-2007 data distribution, or the password used to access the expanded regional data in the GGKBOU dataset. If you do not have the current data distribution, please contact our hotline (

The fixed files are:

  • HHRF (weighting factors for households): in preparing the weighting factors for households, an older version was mistakenly distributed for the variables WHHRFALL and XHHRFALL. These have now been replaced with the revised version.
  • PBIOSPE: Due to a problem in data storage, some of the earnings biographies surveyed since wave U for the first time or subsequently were not recorded correctly. PBIOSPE was therefore revised retroactively from wave U on.
  • XHBRUTTO: Here, an erroneous code for the East German federal states was corrected in the variable XBULA.
  • WP: WKLAS, WIS88, WIS88N and WKLASN were updated. This was necessary since some data had been overwritten with missings.
  • WPGEN/XPGEN: Because of the corrections to WKLAS and WIS88 in WP, it was necessary to update some generated variables which are derived from $KLAS and $IS88: This includes the variables IS8806, ISEI06, MPS06, SIOPS06, EGP06 and KLAS06. Furthermore, due to the revision of PBIOSPE (see above) EXPFT$$, EXPPT$$ and EXPUE$$ were also updated.
  • HBRUTT00: Because of a conflict in household IDs for the expanded original gross sample F, the household IDs had to be changed in some cases. This only applies to households that did not provide valid SOEP interviews.
  • GGKBOU: As a result of the change in HBRUTT00, the identifier HHNR was adapted in some cases in this dataset as well.

Survey Instruments 2007: Field-de

Please find all sample specific questionnaires of this year and all questionnaires of previous years on this site

1) Handgreifkraftmessung im Sozio-oekonomischen Panel (SOEP) 2006 und 2008

2) The new IAB-SOEP Migration Sample: an introduction into the methodology and the contents

3) The Request for Record Linkage in the IAB-SOEP Migration Sample

4) Flowcharts for the Integrated Individual-Biography Questionnaire of the IAB-SOEP Migration Sample 2013

5) The Measurement of Labor Market Entries with SOEP Data: Introduction to the Variable EINSTIEG_ARTK

6) Job submission instructions for the SOEPremote System at DIW Berlin – Update 2014

7) SOEP 2015 – Informationen zu den SOEP-Geocodes in SOEP v32

8) Editing and Multiple Imputation of Item Non-response in the Wealth Module of the German Socio-Economic Panel

9) Die Vercodung der offenen Angaben zu den Ausbildungsberufen im Sozio-Oekonomischen Panel

10) Das Studiendesign der IAB-BAMF-SOEP Befragung von Geflüchteten

11) Scales Manual IAB-BAMF-SOEP Survey of Refugees in Germany – revised version

12) SOEP 2010 – Preparation of data from the new SOEP consumption module: Editing, imputation, and smoothing

13) SOEP Scales Manual (updated for SOEP-Core v32.1)

14) Kognitionspotenziale Jugendlicher - Ergänzung zum Jugendfragebogen der Längsschnittstudie Sozio-oekonomisches Panel (SOEP)

15) Die Vercodung der offenen Angaben zur beruflichen Tätigkeit nach der International Standard Classification of Occupations 2008 (ISCO08) - Direktvercodung - Vorgehensweise und Entscheidungsregeln bei nicht eindeutigen Angaben

16) Die Vercodung der offenen Angaben zur beruflichen Tätigkeit nach der Klassifikation der Berufe 2010 (KldB 2010): Vorgehensweise und Entscheidungsregeln bei nicht eindeutigen Angaben

17) Multi-Itemskalen im SOEP Jugendfragebogen

18) Zur Erhebung des adaptiven Verhaltens von zwei- und dreijährigen Kindern im Sozio-oekonomischen Panel (SOEP)

19) Documentation of ISCED Generation Based on the CAMCES Tool in the IAB-SOEP Migration Samples M1/M2 and IAB-BAMF-SOEP Survey of Refugees M3/M4 until 2017

20) Missing Income Data in the German SOEP: Incidence, Imputation and its Impact on the Income Distribution

21) SOEP 2006 – TIMEPREF: Dataset on the Economic Behavior Experiment on Time Preferences in the 2006 SOEP Survey

22) Assessing the distributional impact of "imputed rent" and "non-cash employee income" in microdata : Case studies based on EU-SILC (2004) and SOEP (2002)

All documentation for filtering can be found on this page