Direkt zum Inhalt

String Coding in a Generic Framework

Aufsätze in Sammelwerken 2016

Manuel Munz, Knut Wenzig, Daniel Bela

In: Edited by Hans-Peter Blossfeld, Jutta von Maurice, Michael Bayer, Jan Skopek , Methodological Issues of Longitudinal Surveys
Wiesbaden: Springer
S. 709-726


For many questions in the social sciences that are supposed be answered with survey data, reliable and detailed information about occupations is crucial. As classifications for occupations are very extensive and complex, it is not feasible to simply present a full scheme to the respondent. To overcome this issue, an open string is queried from the respondent and later converted to an appropriate entry in a chosen classification. This task can be handled using a generic coding framework, which is illustrated in this article. The raw material with the strings-to-code (reported occupations) and covariate information has to be prepared and delivered to the process itself. The selected coding scheme has to meet several requirements, such as discriminatory power, completeness, and adequacy. The NEPS’s coding framework can be adapted to a larger set of variables: The interface for exporting content-to-code from the NEPS dataset files is used beyond the coding of occupational information. Every NEPS survey developer who is urged to classify his or her string variable(s) is provided with spreadsheets ready for the related workflow. When finished, the NEPS Data Center re-imports these spreadsheets into the dataset. Several further mechanisms have been integrated into this process to ensure high data quality.

Knut Wenzig

Research Associate in the German Socio-Economic Panel study Department