Researchers in the social, behavioral, educational, and economic sciences work with different types of data that are considered particularly sensitive due to legal or ethical restrictions and that were not originally collected for research purposes.
KonsortSWD aims to assist researchers working together on multi- and interdisciplinary projects to implement research data management (RDM) plans. The institutions in KonsortSWD are contributing their experience in the operation of user-oriented research data infrastructures to the National Research Data Infrastructure (NFDI) in order to strengthen, expand, and deepen a research data infrastructure for the study of human society. The project is primarily user-driven and addresses the needs of the research communities involved. The core of KonsortSWD’s RDM strategy is to provide researchers and research data centers (RDCs) with the tools and services they need for managing and sharing (new) sensitive and non-sensitive data in compliance with the FAIR principles for scientific data management and stewardship. This will include supporting sustainable RDM in all phases of the research data lifecycle and ensuring data accessibility, while taking ethical and legal considerations into account. Further information can be found on the consortium's website and first publications of the Measures are available on the multidisciplinary repository Zenodo.
SOEP is coordinating Task Area 3 “Data Production” and is responsible for the individual measures TA2.M2 (RDCnet) and TA3.M5 (Open Data Format).
Management: Jan Goebel
Coordination: Janina Britzke
TA3.M1: Harmonized Variables - Combining Survey Data more Easily through Standardised and Harmonised Variables
TA3.M2: Supporting Research Data Centers
TA3.M3: Linking Textual Data
TA3.M4: CODI – A service for coding open responses in surveys
TA3.M5: Open Data Format
The five subprojects listed above focus on the following areas of research and service provision:
In the years to come, we aim to provide additional benefits to the community of data producers by making the standards and tools of research data management sustainable and by improving long-term archiving. For data users, the quantity of available data and the range of different data types will be expanded by enabling linkage of data types and opening up new possibilities for the use of existing data.
Project lead: Jan Goebel
Collaborators: Neil Murray, Kenny Pedrique
To create the optimal conditions for empirical research, it is crucial that data access is easy as well as secure. Researchers are normally able to access anonymized microdata after signing a contract with the data provider, but in the case of detailed, weakly anonymized data, data can only be used on-site at guest researcher workstations, which often means spending large amounts of time and money. Improving access to sensitive data is an important criterion for maximizing research potential.
KonsortSWD Measure TA2.M2 aims to close this gap. It will establish a research data infrastructure network (RDCnet) connecting guest researcher workstations at the participating research data centers in a network of secure data access points. This will enable researchers to access sensitive data from any of the participating guest researcher workstations. By improving ease of access, the measure will increase the number of data users, while leaving control over the ultimate distribution of the datasets with the data providers to ensure adherence individual standards of data security.
Project lead: Knut Wenzig
Collaborators: Xiaoyao Han, Claudia Saalbach
The principles of good scientific practice require that the steps of the research process as well as the materials used or produced are clearly documented and made accessible for subsequent use. During the research data lifecycle, numerous documents are produced that document the research process (e.g., study design descriptions, questionnaires, codebooks, descriptive summaries, data analysis replication codes). Ideally, each of these documents should be findable, accessible, interoperable, and reusable. One way of meeting these criteria is through the use of metadata to organize the research process. Social scientists use different and sometimes proprietary data analysis software packages that process metadata in different ways. In some cases, the metadata cannot be accessed through the data file itself but through pdf files or webpages. The different data formats used in statistical software packages that are only partially compatible present an obstacle for replication studies. Proprietary data formats in particular jeopardize the FAIR principle of interoperability.
The goal of this project is (A) to develop an open, non-proprietary, multilingual, metadata-enriched data format that (B) can be used with common statistical programs and that also enables access to the metadata. The data products will be described directly by the metadata; they will be more accessible and interoperable and will re-use upstream metadata. The project will also approach other communities that use metadata to take their software or metadata schema requirements into consideration and thus to integrate and expand the user base for the new data format. Specifications and software, including source codes, will be provided as FLOSS software under license (e.g., CC, MIT, LGPL), making the products easily usable in different contexts.
Project outcomes will include the following:
a) Development of a conversion filter that can be used to convert individual metadata structures into the KonsortSWD metadata schema.
b) Development of import filters for common statistical programs so that the KonsortSWD metadata schema can be used in dataset labeling and data management.