KonsortSWD - Consortium for the Social, Behavioural, Educational and Economic Sciences in the National Research Data Infrastructure (NFDI)

Current Project

Project Management

Christof Wolf, GESIS (spokesperson)
Representatives of SOEP:
Stefan Liebig (from 2021 on)
Jürgen Schupp (up to 2020)
Jan Goebel

Project Period

October 1, 2020 - September 30, 2025

Funded by

Deutsche Forschungsgemeinschaft (DFG)

In Cooperation With

GESIS - Leibniz Institute for the Social Sciences
DIPF - Leibniz Institute for Research and Information in Education
DZHW - German Centre for Higher Education and Science Research
LIfBi - Leibniz Institute for Educational Trajectories
SOFI - Sociological Research Institute Göttingen at the Georg-August-Universität
University of Bremen – Qualiservice Research Data Center
University of Duisburg-Essen – Chair of Public and Regional Policy
WZB - Berlin Social Science Center
ZBW - Leibniz Information Centre for Economics
ZPID - Leibniz Institute for Psychology

Researchers in the social, behavioral, educational, and economic sciences work with different types of data that are considered particularly sensitive due to legal or ethical restrictions and that were not originally collected for research purposes.

KonsortSWD aims to assist researchers working together on multi- and interdisciplinary projects to implement research data management (RDM) plans. The institutions in KonsortSWD are contributing their experience in the operation of user-oriented research data infrastructures to the National Research Data Infrastructure (NFDI) in order to strengthen, expand, and deepen a research data infrastructure for the study of human society. The project is primarily user-driven and addresses the needs of the research communities involved. The core of KonsortSWD’s RDM strategy is to provide researchers and research data centers (RDCs) with the tools and services they need for managing and sharing (new) sensitive and non-sensitive data in compliance with the FAIR principles for scientific data management and stewardship. This will include supporting sustainable RDM in all phases of the research data lifecycle and ensuring data accessibility, while taking ethical and legal considerations into account.

SOEP is coordinating Task Area 3 “Data Production” and is responsible for the individual measures TA2.M2 (RDCnet) and TA3.M5 (Open Data Format).

DIW Team

Coordination of Task Area 3 (Data Production)

Management: Stefan Liebig
Coordination: Janina Britzke

TA3.M1: Harmonized Variables
TA3.M2: Qualitative Research Data Management (RDM)
TA3.M3: Textual Data
TA3.M4: Open Response Coding
TA3.M5: Open Data Format

The five subprojects listed above focus on the following areas of research and service provision:

  • interoperability and reusability of survey data through ex-ante/ex-post harmonization
  • standards for research data management (RDM) for qualitative data
  • unstructured text data use and linkage with standardized survey data
  • efficiency and quality of coding for (semi-)open response formats
  • non-proprietary data formats and long-term archiving

In the years to come, we aim to provide additional benefits to the community of data producers by making the standards and tools of research data management sustainable and by improving long-term archiving. For data users, the quantity of available data and the range of different data types will be expanded by enabling linkage of data types and opening up new possibilities for the use of existing data.

Measure TA2.M2 (Creating Single Points of Access for Sensitive Data in RDCs)

Project lead: Jan Goebel
Collaborators: Neil Murray, N.N.

To create the optimal conditions for empirical research, it is crucial that data access is easy as well as secure. Researchers are normally able to access anonymized microdata after signing a contract with the data provider, but in the case of detailed, weakly anonymized data, data can only be used on-site at guest researcher workstations, which often means spending large amounts of time and money. Improving access to sensitive data is an important criterion for maximizing research potential.

KonsortSWD Measure TA2.M2 aims to close this gap. It will establish a research data infrastructure network (RDCnet) connecting guest researcher workstations at the participating research data centers in a network of secure data access points. This will enable researchers to access sensitive data from any of the participating guest researcher workstations. By improving ease of access, the measure will increase the number of data users, while leaving control over the ultimate distribution of the datasets with the data providers to ensure adherence individual standards of data security.

Measure TA3.M5 (Open, Metadata-Enriched, Non-Proprietary Data Format for Data Dissemination)

Project lead: Knut Wenzig
Collaborators: Xiaoyao Han, Claudia Saalbach

The principles of good scientific practice require that the steps of the research process as well as the materials used or produced are clearly documented and made accessible for subsequent use. During the research data lifecycle, numerous documents are produced that document the research process (e.g., study design descriptions, questionnaires, codebooks, descriptive summaries, data analysis replication codes). Ideally, each of these documents should be findable, accessible, interoperable, and reusable. One way of meeting these criteria is through the use of metadata to organize the research process. Social scientists use different and sometimes proprietary data analysis software packages that process metadata in different ways. In some cases, the metadata cannot be accessed through the data file itself but through pdf files or webpages. The different data formats used in statistical software packages that are only partially compatible present an obstacle for replication studies. Proprietary data formats in particular jeopardize the FAIR principle of interoperability.

The goal of this project is (A) to develop an open, non-proprietary, multilingual, metadata-enriched data format that (B) can be used with common statistical programs and that also enables access to the metadata. The data products will be described directly by the metadata; they will be more accessible and interoperable and will re-use upstream metadata. The project will also approach other communities that use metadata to take their software or metadata schema requirements into consideration and thus to integrate and expand the user base for the new data format. Specifications and software, including source codes, will be provided as FLOSS software under license (e.g., CC, MIT, LGPL), making the products easily usable in different contexts.

Project outcomes will include the following:

  1. Specification and documentation of a uniform metadata schema (KonsortSWD Metadata Schema) in consultation with the KonsortSWD research data centers.
  2. Technical integration of the metadata scheme

a) Development of a conversion filter that can be used to convert individual metadata structures into the KonsortSWD metadata schema.

b) Development of import filters for common statistical programs so that the KonsortSWD metadata schema can be used in dataset labeling and data management.


Janina Britzke
Janina Britzke

Staff Member of the Division Knowledge Transfer in the German Socio-Economic Panel study Department