Reports , News of 29 November 2018

SOEP-Core v34 – What’s New?

1. New, user-friendly integrated data format

In the new wave of the SOEP-Core study, we are bringing together the wide and long data formats, which were previously provided to users separately. In so doing, we aim to eliminate any confusion about what is available in which format and to make data use easier overall. After having tested SOEPlong for several years as an additional service facilitating data analysis for both beginning and longtime users, we will now be providing all datasets in “long” format as a standard part of our SOEP data release.

This means that all data users will be receiving the different SOEP data formats listed below in their data file, some of which will be in separate subdirectories. Please make sure that you unpack the entire directory structure when unpacking your data.

1.1 “Long” format on the top level

In the top-level (or root) directory, you will find all of the datasets provided up to now with SOEPlong (e.g., pl, ppfadl, etc.) as well as all of the additional datasets formerly provided only in our classic “wide” format (e.g., the biographical or spell data such as bioparen, artkalen, etc.). All of the data in the main SOEP-Core study are therefore covered by the datasets in the top-level directory.

1.2 Classic format in the subdirectory “raw”

Since we know that many users have existing scripts that are based on the original data format, and to enable users to understand the process of generating the “long” data, we provide all of the datasets in their original SOEP format in the directory “raw”. Users who want to continue using the old format simply need to switch into this subdirectory and use the datasets there. The only change is that there are now additional identifiers in all of the datasets in the “raw” directory with the name in the long format (pid and persnr or hid and $hhnrakt) and an “syear” variable so that users can easily merge variables from the two data formats.

1.3 The “long”-format SOEP data

After having provided the “long” data for several years as an additional service to facilitate data use, we are convinced that this format is easier to use especially for beginners. We have therefore decided to use this as our primary data format in future data releases. 

All available individual year-specific datasets are pooled into a single dataset (e.g. all $P datsets are integrated into the PL dataset). In some cases, this means that we have to harmonize variables over time. Harmonization is undertaken to be able to define variables consistently over time: For instance, income information is given in euros up to 2001 and not in deutschmarks, and in cases where questionnaires have changed, the categories are modified over time. All changes are presented to users in a clear and understandable way, and all modified variables are provied in their original form. SOEPlong thus significantly reduces the number of datasets and the number of variables.

A more detailed description of the future format of our SOEP-Core data releases can be found in our new SOEPcompanion

2. New EU-SILC clone

Many users are undoubtedly aware that the SOEP supports cross-national analysis with CNEF through the dataset pequiv. We have now produced a data product that allows you to use the SOEP data in comparative analyses with the EU-SILC (European Union Statistics on Income and Living Conditions) data. EU-SILC, which is provided by Eurostat upon request, offers cross-sectional and longitudinal information for many European countries. Up to now, only cross-sectional information have been available for Germany. The EU-SILC clone offers longitudinal information on private households in Germany based on the SOEP data. All of the information contained in it can be directly compared with the EU-SILC longitudinal information on other European countries. The EU-SILC clone is integrated into the standard SOEP data release (in another subdirectory). Documentation on the 2005-2016 EU-SILC clone can be found here.

3 New samples in the main SOEP study

The new SOEP data release (v34) will be the first to contain data from the IAB-BAMF-SOEP Survey of Refugees in Germany as Sample M5, as well as the continuation of the PIAAC-L Survey, as Sample N.

IAB-BAMF-SOEP Survey of Refugees (M5)

The SOEP, in cooperation with the Institute for Employment Research (IAB) and the Federal Office for Migration and Refugees (BAMF), has succeeded in integrating a third sample of refugee households (M5) into the SOEP study. The survey was launched in 2017. The population of M5 covers adult refugees who have applied for asylum in Germany since January 1, 2013, and are currently living in Germany. M5 added another 1,519 households of refugees who have migrated to Germany since 2013 to the SOEP framework.  

Integration of respondents from PIAAC-L as Subsample N

Sample N integrated 2,314 households of former participants of the Program for the International Assessment of Adult Competencies (PIAAC and PIAAC-L) in 2017. This is the most recent addition to the SOEP-Core samples. Fieldwork in sample N was conducted between mid-March and mid-August and thus slightly later than the majority of samples A–L1.  More information on the PIAAC-L project