Archive for Re-analysis of Published Findings

Data protection issues are of utmost importance to SOEP and CNEF users as well. First, data protection comprises part of the (implicit) contract between the survey and the respondent. Second, in order to access the data, users are required to address data protection issues thoroughly. Ultimately, all these precautions are crucial to ensure future participation by panel respondents. As such, making SOEP and CNEF data available for re-analyses while maintaining the highest levels of data protection can present a major challenge. Whenever such a microdata set is not considered completely anonymous from a legal point of view, we-as data producers-are not permitted to allow archiving without setting and guaranteeing adherence to clear-cut access regulations.

More and more of the scholarly journals that publish empirical papers using microdata stipulate that the microdata be submitted for archiving along with the paper itself. This policy is becoming increasingly widespread. It includes journals like the Journal of Applied Econometrics and the American Economic Review. The latter recently adopted the following policy. "For published articles, the authors must provide both the data and the programs sufficient for the articles' findings to be replicated. These data and programs are then posted on the journal's Web site. If the use of the data is restricted, the authors must provide instructions on how to obtain permission to use the data. If some of the data are proprietary, the editors try to work out ways for other researchers to use the data. In addition, the journal is encouraging studies to reanalyze data and replicate results." (Kleppner et al, 2009: p. 96-97).

Being very much in favor of improving the statistical infrastructure for re-analyses and replication studies using SOEP data, the SOEP group now offers users the following opportunities to make their "SOEP working data set" available to other researchers. This includes all data formats associated with SOEP, such as CNEF, ECHP, LIS, LWS. We offer in particular a solution for all those cases where we cannot allow storage of SOEP microdata in a journal's editorial office because the microdata in question are not considered "completely anonymized".

  • In any case, we (the SOEP group, together with the data protection officer of DIW Berlin) have to check whether the relevant working data set can be treated as a "completely anonymized data set" (the German term is "absolut anonymisierter Mikrodatensatz"). This may be the case when the number of observations as well as variables is small and all original IDs have been removed. In such cases, the archiving of the data by a journal is possible, but still requires DIW Berlin's official approval.
  • We offer the service of depositing working data sets that cannot be considered completely anonymized in a special archive at DIW Berlin. From our experience, journals generally accept this arrangement. Whoever wants to re-analyze this data set must apply for a standard SOEP user contract in order to be granted access to the archived data. Of course, such a contract includes access to the SOEP Scientific Use File as well. If the data set is exceptionally sensitive due to the consideration of detailed geo-coded data, access for re-analysis will most likely require visiting the SOEP-RDC. Finally, as an additional service, we also store data sets in this archive that are subjected to less stringent restrictions (see above).

Whenever a journal editor asks for your working data set, please contact us at  . We will check whether your data set can be treated as completely anonymized. In this case, you can give the data set to the journal for archiving, although you may also ask us to archive the data set as well. If we rate your data set as not completely anonymized, we offer to deposit it in our special archive and notify the journal editor about this and the access procedure.

In order to improve the infrastructure for the re-analysis of published findings based on SOEP data we also provide information of the following types

  • references to publications using completely anonymized (SOEP) microdata (including links to the dataset)
  • references to publications using a working dataset deposited at our archive  (as offered above),  available for licenced SOEP users
  • references to publications using the SOEP dataset and providing the generated syntax files.

Kleppner, Daniel and Phillip A. Sharp (2009): Research Data in the Digital Age. Science, Vol. 325: 368, 24 July 2009.

Kleppner, Daniel et al. [Committee on Ensuring the Utility and Integrity of Research Data in a Digital Age] (2009): Ensuring the Integrity, Accessibility, and Stewardship of Research Data in the Digital Age. The National Academies Press, Washington, D.C.