SOEP data sets are available for your use in university for teaching. We are now providing two different datasets depending on your teaching needs:
This dataset in STATA format is based on the original SOEP data, but provides them in a significantly altered and fully anonymous form, which allows the practice dataset to be used independently of data distribution contracts and user agreements. The practice dataset consists of a total of 26 original variables and 12,922 measurements, covers five time points, and is available in “SOEPlong” format. The dataset is provided in German and English.
For the alteration of the variables, an algorithm was used that largely maintains the longitudinal information in the original data. The practice dataset is therefore suited to calculating panelspecific univariate statistics (intra- and inter-individual correlation patterns, transition rates) in the framework of classes on descriptive methods. The appropriate programming commands in modern statistical programs, such as the STATA xt family, provide realistic results.
In the context of multivariate analysis, the dataset is useful for teaching (panel) regression techniques. Here, too, the qualities of panel data and the impact of various analysis procedures (e.g., fixed effects, random effects) can be demonstrated in a realistic manner when using the appropriate programming commands. Despite the limitations of the practice data set, they also allow for illustration of interaction and mediation techniques. Numerous examples of analyses using the dataset can be found in the textbook 'Regressionsmodelle zur Analyse von Paneldaten' (Marco Giesselmann and Michael Windzio, Springer VS).
Under no circumstances should the dataset be used in real analyses. Due to the procedure used to alter the data, they only roughly reflect the actual relationships in the SOEP. Also, data preparation techniques can only be taught and practiced to a very limited extent due to the extremely narrow segment of original data provided. In such cases, use of the SOEP teaching dataset is still indispensable.
Please note that to use the comprehensive SOEP teaching dataset, data protection regulations stipulate that you need a data distribution contract with the DIW Berlin. The contract holder is responsible for ensuring strict adherence to data protection! She/he should therefore make sure that all students as well as colleagues uphold all relevant data protection laws.
German data protection laws stipulate that only a maximum of 50% of all cases may be used for teaching purposes. This selection is easily chosen with the help of the random group variable, which can be used to separate the data set into 20 subsamples. The variable RGROUP20, which can be found in the dataset CIRDEF, has exactly 20 values. Only cases with values from 11 to 20 may be used for teaching purposes.
If you have a contract to use the full dataset, you can generate the teaching version yourself: Please run the following scripts for either SPSS, Stata or SAS, which will create the teaching version from the installed original data.
For data protection reasons, students in the classroom are under no circumstances permitted to have access to the data in random groups 1-10 (if you use the 95% subsample, then no case will have the value 1. Access to the original dataset is naturally also prohibited.
The comprehensive SOEP teaching dataset for your students should be installed in a separate disk area, access to which is controlled by the contract holder. Students are of course not allowed to take any data home or to install them anywhere else within the university.
Information about the comprehensive SOEP teaching dataset as pdf file (PDF, 380.38 KB).