DOI 10.5684/soep.iab-soep-mig.2015

Release of the IAB-SOEP-Migration Samples 2015


Titel: IAB-SOEP Migration Samples (M1, M2), data of the years 2013-2015

DOI: 10.5684/soep.iab-soep-mig.2015

Publication date: February 16, 2017


Principal investigators: Herbert Brücker, Jürgen Schupp

Co-PIs: Martin Kroh, Jan Goebel, Parvati Trübswetter

Affiliated staff members for providing the Scientific Use File: Charlotte Bartels, Philipp Eisnecker, Klaudia Erhardt, Alexandra Fedorets, Markus Grabka, Marco Giesselmann, Peter Krause, Simon Kühne, David Richter, Rainer Siegers, Paul Schmelzer, Christian Schmitt, Daniel Schnitzlein, Carsten Schröder, Knut Wenzig

Data collection: TNS Infratest Sozialforschung GmbH.

Collecting period: 2013-2015

If you publish using this data, it is mandatory to quote the following references:

Herbert Brücker, Martin Kroh, Simone Bartsch, Jan Goebel, Simon Kühne, Elisabeth Liebau, Parvati Trübswetter, Ingrid Tucci & Jürgen Schupp (2014): The new IAB-SOEP Migration Sample: an introduction into the methodology and the contents. SOEP Survey Paper 216, Series C. Berlin, Nürnberg: DIW Berlin.

Comprehensive documentation is available in

Data access

All SOEP users with a valid contract can order the data from the IAB-SOEP Migration Sample in the usual ways (SOEPhotline, website) without needing to sign any additional data distribution contracts. The dataset is available upon request for free via personalized secure download. New users find information on the SOEP application process here.

Via Research Data Centre (FDZ) of the Federal Employment Agency at the Institute for Employment Research:
The application process of the  FDZ IAB .

Study information

The IAB-SOEP Migration Sample is a joint project of the Institute for Employment Research (IAB) and the Socio-Economic Panel (SOEP) at the German Institute for Economic Research (DIW Berlin). The project tempts to overcome limitations of previous datasets by drawing a sample that takes into account changes in the structure of migration to Germany since 1995. The dataset is an additional sample for the SOEP-Core study and therefore completely harmonized with the SOEP and integrated into SOEP v32 (identical questionnaire with additional questions on the respondent's migration situation). The study opens up new perspectives for migration research and gives insights on the living situations of new immigrants to Germany.


The sampling frame of the IAB-SOEP Migration Sample M1 is based on the administrative records in the Integrated Employment Biographies (IEB) of the IAB and the households have been surveyed initially in 2013. The sample has been drawn from 250 regional units with a multi-step procedure that permits a random selection of individuals in the target population using an equal probability selection method. In each of the 250 regional units, 80 addresses were drawn randomly, where some countries of origin have been given a higher probability to ensure a sufficient number of observations. These groups consist of immigrants from the EU-New Member States and Southern European countries.

The households from the second IAB-SOEP Migration Sample (M2) surveyed in 2015 are now also included in this release. The target population of the second IAB-SOEP Migration Sample consists of immigrants to Germany who have arrived between 2010 and 2013. Migrants from the new EU member states in Eastern Europe dominate this group. This focus will make it possible to better describe the dynamic recent evolution of immigration to Germany. The sample M2 consists of 1,096 households, and was, like sample M1, drawn from register data from the Federal Employment Agency.


The sampled individuals have been interviewed with a personal questionnaire including questions on their (migration) biography and a household questionnaire. All family members aged 16 years or older were also asked the questions of the personal questionnaire. From the second survey year on all other SOEP questionnaires are in the field, especially the age specific Mother and Child Questionnaries. 

The IAB-SOEP Migration Sample provides a database which permits to gain new insights by addressing various aspects of immigration regarding activation and attraction of skilled immigrants and to provide clear guidance regarding immigration, integration and labour market policies in Germany. The database contains the following information:

  • Migration history: Year of immigration, migration history, search behaviour and information channels, social networks.
  • Education history: highest obtained schooling and vocational degrees, years of schooling, education acquired at home and abroad, acknowledgement procedure of foreign credentials, language proficiency.
  • Employment history: Employment, self-employment, unemployment in Germany and in foreign countries.
  • Labour market background: Earnings, full- and part-time employment, working hours, benefit assistance, reservation wage, participation in active labour market policies.
  • Return migration: Return migration intensions, return migration, limited survey of returned immigrants.
  • Miscellaneous: Remittances to home countries, life satisfaction, risk preferences, social integration and acceptance.

The study opens up new perspectives for migration research and gives insights on the living situations of new immigrants to Germany.

The questionnaire covers the complete migration, education, and labor market histories of respondents in both their country of origin and in all countries to which they have lived in. Additionally the questionnaire includes several new batteries of questions that have not previously been considered in the SOEP or other household surveys in Germany, or not in the necessary depth. Examples include questions on earnings and labor market integration and occupational status before migration, migration decisions in the family and partnership context, and purposes and transfer channels of remittances.

The IAB-SOEP Migration Sample substantially increases the available sample size in the SOEP-Core study for research on migration and the lives of immigrants in Germany. 4,964 persons residing in 2,723 households participated in the first wave of the survey. Since the survey is also included in the regular SOEP, migrants from the other SOEP-Core samples can be included in analyses, increasing the number of observations further.

Record Linkage

Please note that data from both samples can be linked with administrative employment and income data: Survey respondents are asked to provide explicit consent to record linkage. But since this linked dataset contains social data, these weakly anonymized data are only accessible on site at the Research Data Center of the German Federal Employment Agency at the IAB (FDZ IAB). Researchers can access FDZ IAB data through a guest visit to the IAB or through remote data processing, also arranged with the IAB. The linked data will soon be available to external researchers. Requests for data access should be directed to FDZ IAB, since a contract with IAB for data use is required. (more information).


Additional documents

The new IAB-SOEP Migration Sample: an introduction into the methodology and the contents (SOEP Survey Paper 216)

Flowcharts for the Integrated Individual-Biography Questionnaire of the IAB-SOEP Migration Sample 2013 (SOEP Survey Paper 261)

How to Generate Spell Data from Data in "Wide" Format based on the migration biographies of the IAB-SOEP Migration Sample (SOEP Survey Paper 228)

Methodenbericht zum IAB-SOEP-Migrationssample 2013 (German)
Fieldwork Report 2013 | PDF, 10.16 MB (part of the SOEP Wave Report 2013)
Fieldwork Report 2014 | PDF, 4.61 MB (part of the SOEP Wave Report 2014)
Fieldwork Report 2015 | PDF, 11.33 MB (part of the SOEP Wave Report 2015)

The 2013 IAB-SOEP Migration Sample (M1): Sampling Design and Weighting Adjustment (SOEP Survey Paper 271)

Data description

Data structure

The data structure is very similar to the structure used in SOEP-Core. Each wave is identified by letters of the alphabet: the first wave in 1984 is wave “A”, 1985 is wave “B”, and so on, up to BF in 2015. To simplify notation, the “$” sign is used, when all waves of one group of datasets are referred to. For example, $H refers to all household level datasets AH to BFH. For each year of SOEP data there are single data files for households (e.g. $H) as well as for individual respondents (e.g. $P) and children (e.g. $KIND) based on interview information. These observations make up the “net” population, with each of these files containing as many records as interviews could be conducted. Additional data files with a limited number of variables based on the “address log” constitute the “gross” number of households and persons, i.e. all households and their members which were eligible for an interview in any given year. For an overview, please see the table below

Data setlabelSurvey yearsSubject of analysis
ppfad Individual Tracking File P
hpfad Household Tracking File H
$$p_mig Integrated personal and biographical questionnaire (Sample M specific) 2013/14/15 P
migspell Migration biography in spell format P
$$p Personal questionnaire 2013/14/15 P
$$h Household questionnaire 2013/14/15 H
$$kind Data on children (from HH-Questionnaire) 2013/14/15 P
$$pgen Generated Individual Data 2013/14/15 P
$$pkal Individual Calendar 2013/14/15 P
$$hgen Generated Household Data 2013/14/15 H
mihinc Multiple imputed data on monthly household income H
pflege Persons needing care within the household P
health Health indicators P
$$hbrutto Gross Household Data 2013/14/15 H
$$pbrutto Gross Individual Data 2013/14/15 P
hhrf Weighting and staying probabilities H
phrf Weighting and staying probabilities P
biobirth Generated biographical information: Birth Biography of Female and Male Respondents P
bioedu Generated biographical information: educational participation and transition P
biocouplm Generated biographical information: couple history, monthly P
biocouply Generated biographical information: couple history, annual P
bioimmig Generated biographical information: Generated and Status Variables for Foreigners P
biojob Generated biographical information: First and last job P
bioresid Generated biographical information: Occupancy and Second Residence P
biomarsm Generated biographical information: marital history files, monthly P
biomarsy Generated biographical information: marital history files, annual P
bioparen Generated biographical information: Biography Information for the Parents of SOEP-Respondents P
biosib Generated biographical information: Information on siblings P
biosoc Generated biographical information: Retrospective Data on Youth and Socialization P
biotwin Generated biographical information: Twins in the SOEP P
bioage17 Generated biographical information: Data from the Youth Questionnaire 2014/15 P
bioagel Generated biographical information: Data from the Mother & Child Questionnaires 2014/15 P
pbiospe Generated biographical information: Activity Biography P
cirdef Random Groups H
design Survey Design H
kidlong Data on children (from HH-Questionnaire, in long format) P
lifespell Spell Information on the Pre- and Post-Survey History of SOEP-Respondents P
artkalen Spell data from the activity calendar P
$$pequiv Cross-national Equivalent File 2013/14/15 P
$$page17 Questions from the Youth questionnaire not included in BIOAGE17 2014/15 P
bepluecke Short questionnaire of the year before (if missing) 2015 P
$$school Data from the Pre-teen questionnaire (11-12 years old) 2013/14/15 P
$$vp Data on the deceased person 2014/15 P
cogdj Data on cognitive tests (Youth) P
hbrutt$$ Original gross population of the sample specific first wave 2013/15
pbr_exit Cumulated Exit P

Missing conventions

Survey variables might be missing, i.e. without a valid code or value for different reasons. In the SOEP, negative values are not valid for any variable, but are used instead to code different reasons for missing information. There are two distinctions for missing values: they may originate in the respondent’s answer or in the survey design. The respondent may refuse or not know an answer or she may report invalid values on the one hand, and the interview design may exclude respondents with certain characteristics from some questions on the other (e.g. men will never be asked if they are pregnant). The following codes apply:

-1 no answer / don’t know
-2 does not apply
-3 implausible value
-4 inadmissable multiple response
-5 not included in this version of the questionnaire
-6 version of questionnaire with modified filtering

With the extension of the SOEP in recent years, entirely new samples have been added to the core. In these samples, sometimes questions are left out completely, e.g. to shorten the questionnaire or because the focus of the sample is different as in some of the related studies. In such a case, the variable will be set to “-5 Not included in this version of the questionnaire” for an entire subsample.

With the use of CAPI, recent developments include an “integrated” person questionnaire, i.e. the biography part and the “regular” part of the questionnaire are asked as one. Some of the questions in the biography part are repeated in the regular part. While in the PAPI mode, the respondent will answer the same question twice, the CAPI allows to filter the respondent around the question if it has already been asked. These cases are very rare - if they occur, they receive a code “-6 Version of questionnaire with modified filtering”.