SOEP Archive for Reanalysis of Published Findings

The SOEP supports efforts in the scientific community to make data easily available for replication and reanalysis. At the same time, the SOEP is obligated to ensure that respondents’ data are used solely for scientific purposes. This means that data users have to sign a data distribution contract with DIW Berlin and are forbidden from disseminating any part of the data to third parties. To facilitate reanalysis and replication, the SOEP Research Data Center offers to archive the syntax used by researchers in preparing and analyzing the data for analysis, and makes the syntax available for download from the SOEP-RDC website. The syntax should contain the version of SOEP data used in the form of the DOI to enable replication.

Some journals require that researchers provide access to the dataset used in their research. To meet this demand, we also offer to archive users’ research datasets. The SOEP-RDC will provide the dataset upon request to researchers who have signed a data distribution contract with DIW Berlin.


Publications which provide information for re-analysis

a) with completely anonymized microdata

Winkelmann, Rainer. 2004. Health Care Reform and the Number of Doctor Visits - An Econometric Analysis. Journal of Applied Econometrics 19, no. 4, 455-472 (Pre-published: 2003: IZA DP No. 317. Bonn: Institute for the Study of Labor (IZA)). DOI: 10.1002/jae.764

Dataset and description

Riphahn, Regina T., Achim Wambach, and Andreas Million. 2003. Incentive Effects in the Demand for Health Care: A Bivariate Panel Count Data Estimation. Journal of Applied Econometrics 18, no. 4, 387-405. (Also published 2003: IZA Reprint Series A - 201/2003. Bonn: Institute for the Study of Labor). DOI: 10.1002/jae.680

Dataset and description

b) with dataset available for licenced SOEP users

Schmukle, Stefan C., Martin Korndörfer and Boris Egloff (2019). No evidence that economic inequality moderates the effect of income on generosity. Proceedings of the National Academy of Sciences (PNAS), Advance Online Publication. DOI: 10.1073/pnas.1807942116.

Description and other data files

Fossen, Frank M. and Daniela Glocker (2017): Stated and Revealed Heterogeneous Risk Preferences in Educational Choice. European Economic Review 97, 1-25, DOI: 10.1016/j.euroecorev.2017.03.016

Description | PDF, 219.28 KB

Bremhorst, Vincent, Michaela Kreyenfeld and Philippe Lambert. 2016. Fertility progression in Germany: An analysis using flexible nonparametric cure survival models. Demographic Research 35 (18), 505-534. DOI: 10.4054/DemRes.2016.35.18.

Buggle, Johannes C. 2016. Law and social capital: Evidence from the Code Napoleon in Germany. European Economic Review, 87, August 2016, 148-175. DOI: 10.1016/j.euroecorev.2016.05.003.

Kreyenfeld, Michaela, and Gunnar Andersson. 2014. Socioeconomic differences in the unemployment and fertility nexus: Evidence from Denmark and Germany. Advances in Life Course Research 21, (September 2014), 59-73. DOI: 10.1016/j.alcr.2014.01.007.

Spiess, Martin. 2010. 6 Der Umgang mit fehlenden Werten. In Christof Wolf, and Henning Best (eds.), Handbuch der sozialwissenschaftlichen Datenanalyse, 117-142. Wiesbaden: VS Verlag für Sozialwissenschaften.

Description | PDF, 6.93 KB .

Dataset not completely anonymized. Please contact to gain access.


c) to use with the SOEP dataset

Neundorf, Anja and James Adams. 2018. The Micro-Foundation of Party Competition and Issue Ownership: The Reciprocal Effects of Citizens’ Issue Salience and Party Attachments. British Journal of Political Science 48 (2), 385-406. DOI: 10.1017/S0007123415000642

Stata-syntax (Do-File)

Lersch, Philipp M. 2017. Individual Wealth and Subjective Financial Well-being in Marriage: Resource Integration or Separation? Journal of Marriage and Family 97, 1211-1223. DOI:  10.1111/jomf.12406

Replication files

Stiftung Familienunternehmen. 2016. Entwicklung der Einkommensungleichheit: Daten, Fakten und WahrnehmungenMünchen: Stiftung Familienunternehmen.

Stata-syntax (Do-File)

Leopold, Thomas, and Clemens M. Lechner. 2015. Parents' Death and Adult Well-being: Gender, Age, and Adaptation to Filial Bereavement. Journal of Marriage and Family 77 (3), 747-760. DOI: 10.1111/jomf.12186

Stata syntax (Do-File)

Leopold, Thomas, and Clemens M. Lechner. 2015. Religious Attendance Buffers the Impact of Unemployment on Life Satisfaction: Longitudinal Evidence from Germany. Journal for the Scientific Study of Religion 54 (1), 166-174. DOI: 10.1111/jssr.12171

Stata syntax (Do-File)

Schult, Johannes, Manuela Münzer-Schrobildgen, and Jörn R. Sparfeldt. 2014. Belastet, aber hochzufrieden? Arbeitsbelastung von Lehrkräften im Quer- und Längsschnitt. Zeitschrift für Gesundheitspsychologie, 22, no. 2, 61-67. DOI: 10.1026/0943-8149/a000114

Stata syntax (Do-File)

Schult, Johannes. 2012. Prädiktoren des Berufserfolgs von Hochschulabsolventen: Befunde aus dem Sozio-Ökonomischen Panel. Wirtschaftspsychologie, 14, no. 4, 82-91.

Stata syntax (Do-File)

Brüderl, Josef. 2010. Kausalanalyse mit Paneldaten. In Christof Wolf, and Henning Best (eds.), Handbuch der sozialwissenschaftlichen Datenanalyse, Kapitel 36, 963-994. Wiesbaden: VS Verlag für Sozialwissenschaften

Stata syntax (Do-File)

Gangl, Markus. 2010. Nichtparametrische Schätzung kausaler Effekte mittels Matchingverfahren. In Christof Wolf, and Henning Best (eds.), Handbuch der sozialwissenschaftlichen Datenanalyse, Kapitel 35, 931-961. Wiesbaden: VS Verlag für Sozialwissenschaften.

Stata syntax (Do-File)

Scherer, Stefani, and Josef Brüderl. 2010. Sequenzdatenanalyse. In Christof Wolf, and Henning Best (eds.), Handbuch der sozialwissenschaftlichen Datenanalyse, Kapitel 39, 1031-1051. Wiesbaden: VS Verlag für Sozialwissenschaften.

Stata syntax (Do-File)

Schmiedek, Florian, and Julia K. Wolff. 2010. Latente Wachstumskurvenmodelle. In Christof Wolf, and Henning Best (eds.), Handbuch der sozialwissenschaftlichen Datenanalyse, Kapitel 38, 1017-1029. Wiesbaden: VS Verlag für Sozialwissenschaften.

SAS syntax

Biewen, Martin. 2009. Measuring state dependence in individual poverty histories when there is feedback to employment status and household composition. Journal of Applied Econometrics 24 (7), 1095-1116.  DOI: 10.1002/jae.1081 (Pre-published 2004: DIW Discussion Paper No. 429 | PDF, 390.26 KB ).

Stata syntax (Do-File)

Bellemare, Charles, Bertrand Melenberg, and Arthur van Soest. 2002. Semi-parametric Models for Satisfaction with Income. Portuguese Economic Journal 1 (2), 181-203. (Pre-published 2002: cemmap working paper CWP 12/02). DOI: 10.1007/s10258-002-0006-z

Description | PDF, 457.15 KB

Trede, Mark M. 1998. The Age Profile of Mobility Measures - An Application to Earnings in West Germany. Journal of Applied Econometrics 13, 397-409. DOI: 10.1002/(SICI)1099-1255(199807/08)13:4<397::AID-JAE482>3.0.CO;2-K

GAUSS syntax and programs


Detailed instructions for researchers

Data protection issues are of utmost importance to both SOEP and CNEF. First, data protection is a crucial part of the (implicit) contract between the surveys and their respondents. Second, researchers who want to access the survey data must adhere to strict data protection regulations. The precautions taken by the surveys and data users to guarantee data protection ultimately help to ensure future participation by respondents. Because of the exceptionally high standards of data protection that apply to SOEP and CNEF data, making them available for reanalysis can present a major challenge. The SOEP data are subject to limited access: they are provided solely for research purposes (wissenschaftliche Zweckbindung) and therefore only to members of the scientific community. To obtain the data, researchers must sign a data distribution contract with DIW Berlin. Users of  SOEP data are not permitted to transfer the data to third parties / other users not covered in the data distribution contract.

More and more of the scholarly journals that publish empirical papers using microdata stipulate that the data be submitted for archiving along with the paper itself. Two such journals are the Journal of Applied Econometrics and the American Economic Review. The latter recently adopted the following policy: “For published articles, the authors must provide both the data and the programs sufficient for the articles’ findings to be replicated. These data and programs are then posted on the journal's Web site. If the use of the data is restricted, the authors must provide instructions on how to obtain permission to use the data. If some of the data are proprietary, the editors try to work out ways for other researchers to use the data. In addition, the journal is encouraging studies to reanalyze data and replicate results.” (Kleppner et al, 2009: p. 96-97). The SOEP is keen to support such policies.

In the interests of improving the statistical infrastructure for reanalysis and replication studies using SOEP data, the SOEP group now offers users a variety of options for making their SOEP working dataset available to other researchers. These options apply to all data formats associated with SOEP, including CNEF, EU-SILC-Clone, LIS, and LWS. If your working dataset includes any SOEP microdata (or data derived from SOEP), you as a SOEP user may not transfer the data to the journal’s editorial office, but may instead take advantage of the following alternatives:

  • In our view, the most transparent approach is to publish the relevant syntax for the article along with the data processing method. The syntax published on the journal webpage should clearly identify the SOEP data version used by the authors (more information here). Other researchers who are interested in the data can obtain the identical version after signing a user contract with the SOEP. Users can include a paragraph in their article like the following to describe this process:

    Data Availability: Data are available from the German Socio-economic Panel Study (SOEP) due to third party restrictions (for requests, please contact soepmail@diw.de). The scientific use file of the SOEP with anonymous microdata is made available free of charge to universities and research institutes for research and teaching purposes. The direct use of SOEP data is subject to the strict provisions of German data protection law. Therefore, signing a data distribution contract is a precondition for working with SOEP data. The data distribution contract can be requested with a form, available at: http://www.diw.de/soepforms. For further information, contact the SOEPhotline at either soepmail@diw.de or +49-30-89789-292.

  • If the journal to which you intend to submit a paper does not agree with the approach above, we offer a second option: storing your working dataset in a special archive at DIW Berlin. From our experience, journals generally accept this arrangement. Researchers who want to reanalyze the dataset must apply for a standard SOEP user contract to be granted access to the archived data. Of course, such a contract includes access to all other SOEP scientific use files as well. If the dataset is exceptionally sensitive due to the inclusion of detailed geo-coded data, users will probably be required to visit the RDC SOEP to obtain data access.

  • SOEP data can only be made available for free download under two conditions. First, we (the SOEP group, together with the DIW Berlin data protection officer) have to check whether the relevant working dataset can be treated as a “completely anonymized dataset” (absolut anonymisierter Mikrodatensatz). This may be the case when the number of observations as well as variables is small and all original IDs have been removed. Second, measures have to be undertaken to change the data, for instance, by adding random error to metric variables and randomly interchanging categories in the case of non-metric variables. In such cases, journals may be permitted to archive the data, but only with DIW Berlin’s official approval.

Whenever a journal editor asks for your working data set, please contact us at  . We would be happy to deposit the data in a special archive and notify the journal editor about the access procedure.

In order to improve the infrastructure for the re-analysis of published findings based on SOEP data we also provide information of the following types

  • references to publications using completely anonymized and partly artificial (SOEP) microdata (including links to the dataset)
  • references to publications using a working dataset deposited at our archive  (as offered above),  available for licenced SOEP users, and
  • references to publications using the SOEP dataset and providing the generated syntax files.

References
Kleppner, Daniel and Phillip A. Sharp (2009): Research Data in the Digital Age. Science, Vol. 325: 368, 24 July 2009.

Kleppner, Daniel et al. [Committee on Ensuring the Utility and Integrity of Research Data in a Digital Age] (2009): Ensuring the Integrity, Accessibility, and Stewardship of Research Data in the Digital Age. The National Academies Press, Washington, D.C.