We want to make working with our data as easy as possible. However, due to the large number of topics and the longitudinal structure of the data sets, we try to answer the most important questions on this page.
No. SOEP data can only be used for scientific research. We promise our respondents that their data will be used solely for scientific research purposes, and ensure that all data users adhere to this condition. We require all data users to sign a contract confirming their compliance with data protection measures before receiving access to the data, and only grant data use contracts to researchers who work at a scientific research institution.
Data collection and processing are financed largely through public funding. This allows us to provide the data to registered data users without any costs or fees.
We have written down all relevant aspects on the data access page.
If there are any questions left unanswered, we have listed the contact details of our SOEPhotline there. Our staff there will be happy to help you.
For an introduction to the topics and structure of the SOEP-Core data, see our SOEPcompanion or take a look at our questionnaires. Our metadata portal, paneldata.org, can be used to search for specific topics or keywords or to get an overview of the variables (including frequency counts) and questionnaires. This can be done on a cross-sectional or longitudinal data level.
For a compilation of articles on various possibilities for analyzing SOEP data, see this page.
Take a look at Getting Started, where we describe the various tools and services we offer for first-time and more experienced SOEP data users. There you can see at a glance which tools and services fit your needs.
We always provide the SOEP data as a “complete package”: every answer to every question in every year. We prepare the data to be as user-friendly as possible—for example, by using generated variables and the “long” data format. Nevertheless, we strongly recommend that data users learn how to work with statistical software (SPSS, Stata, R, SAS, etc.) for data analysis.
The structure of our main dataset, SOEP-Core, is described in the SOEPcompanion.
Yes. Our codebooks are the questionnaires “with reference to variables” that we provide online. They are based on metadata and contain references to variables and variable labels, and the datasets in which they are contained, as well as values for the different items and the filter questions used in the questionnaire.
A detailed explanation of how to use the codebooks can be found in our SOEPcompanion.
Like status variables, generated variables serve to simplify your work with SOEP data. We offer generated datasets on specific topics (e.g., BIOJOB “Detailed information on first and last job”) and specific groups (e.g., BIOPAREN “Biographical information on the parents of SOEP respondents”).
A list of SOEP-Core datasets containing generated variables and a short description of each can be found in the SOEPcompanion. The specific assumptions made in the process of variable generation are covered in each of our documentation.
The short answer is that there are so many datasets because we want to know so much about our respondents. At present, we use up to 13 questionnaires in the main samples alone. We aim to provide the data for analysis as authentic and unaltered as possible. Whenever it appears useful, we also provide generated variables (see question above), which are contained in separate datasets.
SOEPcompanion offers a concise overview of the different types of datasets, the information they provide, and how to use them.
That depends. Unfortunately, we are not able to provide all information in both languages and are moving toward providing the majority our documentation in English only. In overview:
Questionnaires: Many of our SOEP-Core questionnaires are provided as “field versions”, which contain both the German PAPI version of the person and the household questionnaire used in the field and the English translation. For SOEP-IS and some other studies, we provide the “version with variables” only, in separate German and English versions.
Dataset documentation: These publications are usually only in English. There are a few (older) publications providing data documentation or documentation on specific topics in German.
Methodological reports: The methodological reports on the fieldwork prepared by the survey institute were available exclusively in German up to and including the survey year 2017 (with the exception of SOEP-IS: main fieldwork in English, supplementary survey in German). The 2011-2017 SOEP Wave Reports include a summary in English.
Study Description: Available on our website in English and German. Please use the language switch in the menu bar for the other language.
Companions: In English only.
Paneldata.org: Only the labels are in both languages; all other information is in English.
Our recommendation is to use 16GB RAM when working with the SOEP data. Users with less powerful computers can still use the data, but should make a few adjustments.
[a] use “describe using pl.dta” to display the variables without loading them,and
[b] enter the variables selectively: “use pid syear plVARS using pl.dta [if syear>=xxxx]”.
This allows you to work with larger datasets more effectively and with lower demands on your hardware and software.
To display all German variable labels in Stata in English, please use the command: label language EN.