Skip to content!

SOEP-Core v33 - Known Bugs/Fixes


Dataset Information

1984-2016 (Wave BG)

May 18, 2018

1. Dataset $PGEN: Variable casmin$$

A missing parenthesis in programming led to individuals in CASMIN category 6 (“(2c_gen) general maturity certificate”) being mistakenly placed in CASMIN category 7 ("(2c_voc) vocational maturity certificate").

For wave BG, this means that of the 4,553 observations in category 7, 1,976 actually belong in category 6 and 2,577 in category 7.

This can be corrected with the existing variable in the $PGEN data. For wave BG, it can be done as follows:

replace casmin16= 6 if  inlist (bgpsbil,3,4) |  bgpsbila==3 |  bgpsbilo==3   

replace casmin16= 7  if (inlist (bgpsbil,3,4) |  bgpsbila==3 |  bgpsbilo==3)  & (inlist (bgpbbila,2,3,5,6,8) | (bgpbbil01>=1 & bgpbbil01<.) | (bgpbbilo>=1 & bgpbbilo<.))

replace casmin16= 8 if inlist (bgpbbil02,1,4)                                    

replace casmin16= 9 if inlist (bgpbbil02,2,3,5,6,7,8) | inlist (bgpbbila,4,7,9)

2. Dataset [BE-BG]PGEN: Variable [be-bg]pbilla ("Vocational Degree Outside Germany") 

The variable $$pbilla (foreign degrees – vocational education) in SOEP v33 was expanded retrospectively to include information on whether the degree had been completed. This revision failed, however, to take into account some of the information covered in certain modules. A correction can be made with the existing variables in the $PGEN data, as shown here: Statement

Dataset Variabel Variable Label
bepgen bepbbila Vocational Degree Outside Germany
bfpgen bfpbbila Vocational Degree Outside Germany
bgpgen bgpbbila Vocational Degree Outside Germany

3. Dataset BIOAGEL: Variable bioage

In the dataset BIOAGEL,the data type was not adjusted for the variable bioage. The variable shows which questionnaire the row of data was taken from. Since the variable bioage has included values > 99 since v33, this led to values > 99 being cut off in Stata. The cut-off values are:

Variable Value Label
bioage 101 “bioage10a”
bioage 102 “bioage10b(only FID)”

4. Dataset CIRDEF: Variable rgroup

The variable rgroup divides the SOEP sample into 20 equally sized groups. It is used to select the 50% sample. Since the new samples M3 and M4 were incorrectly assigned, there are no cases from these samples in the teaching version of the SOEP data.

January 30, 2018 Various updates forced us to distribute a new version. Please see the 'Changes in the Dataset' page for the documentation of the changes.