DIW Berlin: Questions and Answers (FAQ)

We want to make working with our data as easy as possible. However, due to the large number of topics and the longitudinal structure of the data sets, we try to answer the most important questions on this page.

How do I get the data?

Are the data available to anyone?

No. SOEP data can only be used for scientific research. We ensure our respondents that their data will be used solely for scientific research purposes, and that all data users adhere to this condition. We require all data users to sign a contract confirming their compliance with data protection measures before receiving access to the data, and only grant data use contracts to researchers who work at a scientific research institution.

How much do the data cost?

Data collection and processing are financed largely through public funding. This allows us to provide the data to registered data users without any costs or fees.

And how do I get the data?

We have written down all relevant aspects on the data access page.
If there are any questions left unanswered, we have listed the contact details of our SOEPhotline there. Our staff will be happy to help you.

At the beginning

Could the SOEP data be useful for my research project?

For an introduction to the topics and structure of the SOEP-Core data, see our SOEPcompanion or take a look at our questionnaires. Our metadata portal, paneldata.org, can be used to search for specific topics or keywords or to get an overview of the variables (including frequency counts) and questionnaires. This can be done on a cross-sectional or longitudinal data level.

For a compilation of articles on various possibilities for analyzing SOEP data, see this page.

This is my first time working with SOEP data. What is the best way to get started?

Take a look at Getting Started where we describe the various tools and services we offer for first-time and more experienced SOEP data users. There you can see at a glance which tools and services fit your needs.

I only need a small subset of the SOEP data (a specific year or a few variables). How can I get this information?

We always provide the SOEP data as a “complete package”: every answer to every question in every year. We prepare the data to be as user-friendly as possible—for example, by using generated variables and the “long” data format. Nevertheless, we strongly recommend that data users learn how to work with statistical software (SPSS, Stata, R, SAS, etc.) for data analysis.

The structure of our main dataset, SOEP-Core, is described in the SOEPcompanion.

Are codebooks available?

Yes. Our codebooks are the questionnaires “with reference to variables” that we provide online. They are based on metadata and contain references to variables and variable labels, and the datasets in which they are contained, as well as values for the different items and the filter questions used in the questionnaire.

A detailed explanation of how to use the codebooks can be found in our SOEPcompanion.

What are generated variables and when are they best used?

Like status variables, generated variables serve to simplify your work with SOEP data. We offer generated datasets on specific topics (e.g., BIOJOB “Detailed information on first and last job”) and specific groups (e.g., BIOPAREN “Biographical information on the parents of SOEP respondents”).

A list of SOEP-Core datasets containing generated variables and a short description of each can be found in the SOEPcompanion. The specific assumptions made in the process of variable generation are covered in each of our documentation.

Why does SOEP-Core include so many datasets, and how do I find my way around?

The short answer is that there are so many datasets because we want to know so much about our respondents. At present, we use up to 13 questionnaires in the main samples alone. We aim to provide data for analysis that are as authentic and unaltered as possible. Whenever it appears useful, we also provide generated variables (see question above), which are contained in separate datasets.

SOEPcompanion offers a concise overview of the different types of datasets, the information they provide, and how to use them.

Who in a SOEP household answers which questionnaire, and what topics do the questionnaires cover?

Both SOEPcompanion (for SOEP-Core) and SOEP-IScompanion (for the SOEP Innovation Sample), contain a “Survey Design” chapter in which we try to answer these questions in detail.

Is the information I need available in both German and English?

That depends. Unfortunately, we are not able to provide all information in both languages and are moving toward providing the majority of our documentation in English only. Here is an overview:

Questionnaires: Many of our SOEP-Core questionnaires are provided as “field versions,” which contain both the German PAPI version of the person and the household questionnaire used in the field and the English translation. For SOEP-IS and some other studies, we provide the “version with variables” only, in separate German and English versions.

Dataset documentation: These publications are usually only in English. There are a few (older) publications providing data documentation or documentation on specific topics in German.

Methodological reports: The methodological reports on the fieldwork prepared by the survey institute were available exclusively in German up to and including the survey year 2017 (with the exception of SOEP-IS: main fieldwork in English, supplementary survey in German). The 2011-2017 SOEP Wave Reports include a summary in English.

Study Description: Available on our website in English and German. Please use the language switch in the menu bar for the other language.

Companions: In English only.

Paneldata.org: Only the labels are in both languages; all other information is in English.

Help for working with the data

I can’t work with some SOEP-Core data because my computer doesn’t have enough memory. What can I do?

Our recommendation is to use 16GB RAM when working with the SOEP data. Users with less powerful computers can still use the data, but should make a few adjustments.

You can

[a] use “describe using pl.dta” to display the variables without loading them, and

[b] enter the variables selectively: “use pid syear plVARS using pl.dta [if syear>=xxxx]”.

This allows you to work with larger datasets more effectively and with lower demands on your hardware and software.

How can I display German variable labels in Stata in English?

To display all German variable labels in Stata in English, please use the command: label language EN.

SOEP-Core

SOEP-IS

Further Offer

How do I get the data?

Are the data available to anyone? keyboard_arrow_up

How much do the data cost? keyboard_arrow_up

And how do I get the data? keyboard_arrow_up

At the beginning

Could the SOEP data be useful for my research project? keyboard_arrow_up

This is my first time working with SOEP data. What is the best way to get started? keyboard_arrow_up

I only need a small subset of the SOEP data (a specific year or a few variables). How can I get this information? keyboard_arrow_up

Are codebooks available? keyboard_arrow_up

What are generated variables and when are they best used? keyboard_arrow_up

Why does SOEP-Core include so many datasets, and how do I find my way around? keyboard_arrow_up

Who in a SOEP household answers which questionnaire, and what topics do the questionnaires cover? keyboard_arrow_up

Is the information I need available in both German and English? keyboard_arrow_up