Report of July 11, 2014
There is an increasing trend among surveys worldwide to connect respondent data with information from (governmental) administrative records. By using these linked data, researchers hope to validate, increase and specify their item sets and to lessen respondent burden. The SOEP has joined this promising development and seeks to connect the elaborated respondent data from the new SOEP Migration Sample with the precise and reliable information from the federal social insurance records. This linkage will create a unique dataset which will enable high-quality studies on topics concerning the interplay between migration-specific characteristics and individual labor biographies. In accordance with data protection legislation, a random sample of about two-thirds of the roughly 5,000 respondents in 2,700 households were asked to give their consent to data linkage after having completed the questionnaire and after being informed about the implications of consent. A second randomly selected group will be asked for their agreement during the second wave of the study, which is taking place this year.
About 50% of the selected respondents gave their written agreement to record linkage. Due to the relatively large proportion of respondents who withheld consent, selectivity problems could arise. This means that respondents who gave their consent might differ significantly in some characteristics from interviewees who did not. To test this speculation, various multivariate models were estimated that analyzed the influence of variables from the fields of origin and legal status, labor market position and outcomes, language and integration, general demographics as well as household and interviewer characteristics on the probability to consent to the record linkage.
Overall, we found relatively few significant effects in our models, which is a positive sign for the quality of the data. Holding other variables constant, some groups of respondents seem to be somewhat less likely to agree to record linkage than others: among these are respondents without educational qualifications, of Turkish origin or Arab cultural affiliation, and employees and civil servants in low-qualified positions. The single most powerful negative influence in our models was exerted by a general reluctance to answer sensitive questions such as those about the respondent’s income. We are currently assessing the necessity to provide data users with weights to control for these selectivity problems.