1984-2018 (Wave BI)
(as of April 2020)
Dataset: bioage; variable clref
We detected a label error in the data set bioage in the variable clref that could be misleading when analyzing the data. The labels for values [1] und [2] need to be switched.
stata [de]
label def clref ///
1 "[1] Ja, sowohl spez. Klasse als auch Regelunterricht" ///
2 "[2] Ja, ausschliessl. spez. Klasse fuer gefluechtete Kinder", modify
stata [en]
label def clref ///
1 "[1] Yes, both special class and regular classes" ///
2 "[2] Yes, only special class for refugee children", modify
spss [de]
add value labels clref 1 '[1] Ja, sowohl spez. Klasse als auch Regelunterricht' 2 '[2] Ja, ausschliessl. spez. Klasse fuer gefluechtete Kinder' .
spss [en]
add value labels clref 1 '[1] Yes, both special class and regular classes' 2 '[2] Yes, only special class for refugee children' .
1984-2017 (Wave BH)
Overview (May 2019):
Values for the variables plb0186_v2 and plb0186_h for the East sample in 1990 are too small by a factor of 10.
The names assigned to the raw variables bhh_37_01 “electricity included in rent” and bhh_37_02 “assessed burden of housing expenses (rent and additional expenses)” do not correspond to the standard SOEP concept for naming variables. Both variables will be renamed in the new version.
The previous version from the migspell dataset was delivered.
The new identifiers were not filled in and have to be filled in from the old identifiers.
1. Dataset: pl
Variables: plb0186_v2, plb0186_h
Values for the variables plb0186_v2 “Actual working time with overtime (1990-2017)” and plb0186_h “Actual working time with overtime (harmonized)” have the wrong values for the East sample in 1990.
The variable plb0186_h is made up of the variables plb0186_v1 (1984-1989) and plb0186_v2 (1990-2017). We included all of the values for plb0186_v1 as they were, and divided all of the valid values for plb0186_v2 by 10. The process of harmonization is necessary due to the fact that the two raw variables for 1990 were provided in different formats:
gpost: gp3601e (two-digit, no comma)
gp: gp39 (three-digit, no comma)
The raw variable gp3601e from gpost was assigned to the variable plb0186_v2 although it does not have to be divided by 10. As a result, all values for the East German population for the year 1990 were mistakenly divided by 10. The simplest way of solving this problem is to multiply the valid values for the East German population by 10.
cd "Datenpfad" |
Detailed information on the general process used to harmonize variables can be found here:
Versioning and harmonization of variables
Working with harmonized Variables
2. Dataset: bhh
Variables: bhh_37_01, bhh_37_02
The names assigned to the raw variables bhh_37_01 “electricity included in rent” and bhh_37_02 “assessed burden of housing expenses (rent and additional expenses)” do not correspond to the standard SOEP concept for naming variables. Both variables had to be renamed:
bhh_37_01 “Electricity included in rent” → bhh_33
bhh_37_02 “Assessed burden of housing expenses (rent and additional costs)” → bhh_37
To find out more about how raw variables are named in the SOEP, see the SOEPcompanion:
Naming conventions of Variables and Datasets
3. Dataset: migspell
Unfortunately the previous version of the migspell dataset was delivered. For the current version, please contact the SOEPhotline or write an email to soepmail.
4. Dataset: biobirth, bioimmig, biojob, bioparen, bioresid, biosib, biosoc, biotwin, pflege
Variables: pid, cid, hid
In the process of “merging” SOEP-Long and SOEP-Core, all of the SOEP-Long ID variables (pid, hid, cid) were also included in the raw datasets to make merging easier for users. In some datasets, only the ID variables were created but not filled in with the corresponding IDs.
Empty pid: biobirth, bioimmig, biojob, bioparen, bioresid, biosib, biosoc, biotwin, pflege
Empty hid: bioimmig, bioresid, biosoc
Empty cid: biobirth, bioimmig, biojob, bioparen, bioresid, biosib, biosoc, biotwin, pflege
With these datasets, please continue to use persnr, hhnrakt, hhnr, or copy the content into the corresponding new ID variable.
clonevar pid = persnr |
Further information on SOEP identifiers can be found here:
Dataset Identifier
1984-2016 (Wave BG)
May 18, 2018 |
1. Dataset $PGEN: Variable casmin$$ A missing parenthesis in programming led to individuals in CASMIN category 6 (“(2c_gen) general maturity certificate”) being mistakenly placed in CASMIN category 7 ("(2c_voc) vocational maturity certificate"). For wave BG, this means that of the 4,553 observations in category 7, 1,976 actually belong in category 6 and 2,577 in category 7. This can be corrected with the existing variable in the $PGEN data. For wave BG, it can be done as follows:
2. Dataset [BE-BG]PGEN: Variable [be-bg]pbilla ("Vocational Degree Outside Germany") The variable $$pbilla (foreign degrees – vocational education) in SOEP v33 was expanded retrospectively to include information on whether the degree had been completed. This revision failed, however, to take into account some of the information covered in certain modules. A correction can be made with the existing variables in the $PGEN data, as shown here: Statement (TXT, 2.72 KB)
3. Dataset BIOAGEL: Variable bioage In the dataset BIOAGEL,the data type was not adjusted for the variable bioage. The variable shows which questionnaire the row of data was taken from. Since the variable bioage has included values > 99 since v33, this led to values > 99 being cut off in Stata. The cut-off values are:
4. Dataset CIRDEF: Variable rgroup The variable rgroup divides the SOEP sample into 20 equally sized groups. It is used to select the 50% sample. Since the new samples M3 and M4 were incorrectly assigned, there are no cases from these samples in the teaching version of the SOEP data. |
January 30, 2018 | Various updates forced us to distribute a new version. Please see the 'Changes in the Dataset' page for the documentation of the changes. |
1984-2015 (Wave BF)
February 15, 2017 | Various updates forced us to distribute a new version. Please see the doi landing page soep.v31.2 for the documentation of the changes. |
1984-2014 (Wave BE)
June 6, 2016 |
In the file with generated longitudinal data on children (KIDLONG) in SOEP-Core v31.1 another correction had to be implemented: Some few data that only had been asked in the FiD study were missing.
March 18, 2016 | Various updates forced us to distribute a new version. Please see the doi landing page soep.v31.1 for the documentation of the changes. |
1984-2012 (Wave BC)
Mar. 27,2014 |
HGEN Errors in the imputation of electricity, heating, and additional expenses for tenants in the current data distribution resulted in values that were too high. These errors also affected the generation of rent including maintenance but excluding heating. The variables affected are: electr$$, heat$$, util$$, rent$$, and frent$$ for the years 2008 to 2012. The variables typ1hh12 and typ2hh12 changed for two households. BCPKAL Also in the 2012 survey year, after the suspension of compulsory military service in Germany, the related calendar information in the individual questionnaire was revised. This revision was made in the original individual data for 2012 but not in the corresponding calendar data—these have now been updated retrospectively for the data distribution v29. Both errors were corrected and an update is now available for downloading upon request (soepmail@diw.de). If you would like to use this updated version in your work, please cite the version number, SOEP v29.1 (or better, doi: 10.5684/soep.v29.1) in publications using these data. |
1984-2011 (Wave BB)
Dec. 19,2012 |
BIOCOUPLM, BIOCOUPLY, BIOMARSM, BIOMARSY $FAMSTD An update for all corrected files can be downloaded by means of a personalized link. Please contact soepmail@diw.de to obtain your link. Please note: If you use one of the provided bugfixes in your analyses we recommend citing it as follows: |
1984-2010 (Wave BA)
March, 30, 2012 |
BIOAGE03 BIOAGE06 BIOAGE08 LIFESPELL An update for all corrected files can be downloaded, but only by means of a personalized link. Please contact soepmail@diw.de to obtain such a link. Please note: If you use one of the provided bugfixes in your analyses we recommend citing it as follows: |
Jan 2, 2012 |
COGDJ English labels
Also, in the $PGEN data sets, no English value labels were generated for the new variables on educational degrees and training qualifications prior to joining the panel. This applies to the English labels for the following variables: If you use one of those variables, please contact soepmail@diw.de to obtain a download link for the bugfixes. PPFADL in SOEPlong An update for PPFADL can be downloaded, but only by means of a personalized link. Please contact soepmail@diw.de to obtain such a link. Please note: If you use one of the provided bugfixes in your analyses we recommend citing it as follows: |
1984-2009 (Wave Z)
Jan. 6, 2011 |
There was a problem in the assignment of the correct current household number in 3% of the children in the generated longitudinal dataset KIDLONG. The variable HHNRAKT has been corrected accordingly. Please contact soepmail@diw.de if you use the KIDLONG dataset. We will provide an individualized method of downloading the corrected version for both the 100% dataset for the EEA countries and the 95% version available for use worldwide.
Feb. 10, 2010 |
Downloadable bug-fix for children's weighting factors of wave Y (2008) Individuals born in 2002 (thus being 6 years of age in wave Y, 2008) whose parents completed the newly introduced child questionnaire for this particular cohort did not receive a valid score on the wave-specific cross-sectional weighting variable (this population can be identified by YNETTO=23). This affects the variable YPHRF in the file PPHRF and the variable W1110108 in the file YPEQUIV. This inaccuracy applies only to these 237 children aged 6 in this particular wave and affects only the individual, but not the household weights. Moreover, any weighted analysis based only on adult respondents using, for instance, the YP and YPGEN files is virtually unaffected by this error. Users who wish to include the six-year-olds in a weighted analysis are asked to download updated versions of the datasets YPHRF and YPEQUIV. Please send an email to soepmail@diw.de to request a personalized URL and further details. |
Dec. 5, 2009 |
In the dataset BIOIMMIG an incorrect assignment to the variable BIGOBACK (the variable on the probability to return home) was made for the categories -2 (“does not apply”) and 2 (“Yes, probably”) in some cases since 2001. To correct this error, please download the appropriate script for your statistical program (SAS, SPSS or Stata) and run it after adjusting the script to the path of your local settings. Script for Stata (TXT, 320.45 KB)
Nov. 9, 2009 |
Shortly after completing the DVD, an error in data generation was identified in the file BIOPAREN. To correct this error, please download the appropriate script for your statistical program (SAS, SPSS or Stata) and run it after adjusting the script to the path of your local settings. Script for Stata (TXT, 75.48 KB) If you need an update for another statistical programm, please contact our hotline at soepmail@diw.de. |
Dec. 04, 2008 | In the process of extensive checking, several problems were identified in the 1984-2007 data distribution currently available on DVD (waves A-X).
The corrected datasets are now available to be downloaded as a password-protected ZIP file from our homepage. To obtain download access to the corrected datasets, please send an e-mail to soepmail@diw.de or call the SOEPhotline at +49 30 89789 292. To unzip the files, you will need the password for the current 1984-2007 data distribution, or the password used to access the expanded regional data in the GGKBOU dataset. If you do not have the current data distribution, please contact our hotline (soepmail@diw.de). The fixed files are:
Apr. 03, 2008 | We have found some wrong labelling for the variables indicating the owner of the dwelling (VH27 and WH27), please note the relevant corrections in the table below.
This will be fixed with the next data release. Variable Label: Owner Of The Dwelling ValueWrongCorrect
Mar. 31, 2008 | In the information on school and occupational training, the data on graduations and completed training since 2005 contained errors (variables PSBIL and PBBIL01-03 in VPGEN and WPGEN). Further information can be obtained from Henning Lohmann and Peter Krause. | |||||||||||||||||||||
Sept. 28,2007 | In the process of inputting the revised ERWZEIT variables, the VEBZEIT variables in columns of the same name from previous years were overwritten. Both variables have now been corrected in the PGEN files for the years 1984-1997 (Waves A-N). Those users who need to use data from before 1998 for their analyses should input the new PGEN files. The updated data are provided in the various formats for downloading. Please request the passwort from the SOEPhotline. |
Jul. 14, 2006 | In BIOPAREN in the values for the following variables contain errors:
Jul. 13, 2006 |
In BIOAGE01 the labels for the variable BCKSTOER are missing. value labels |
Jul. 12, 2006 | In Microsoft Windows, the links on CD 3 to document names containing "-en" (for example, links to documentation on the generated variables in English) are incorrect. If you receive an error mesage when attempting to access a particular document, change the "-en" to "_en" in your browser´s address window. With Linux and Unix, you shouldn´t have any problems. |
Aug. 24, 2005 | In 2005, the SOEP group together with our field work agency TNS Infratest Sozialforschung, carried out extensive checks on all regional identifiers in the SOEP data such as administrative districts and federal states. Firstly, this enabled us to replace missing values of regional identifiers even in past years with valid information. Secondly, in some cases the regional identifiers $BULA and $SAMPREG have been corrected for former waves. Based on these changes, all information concerning regional identifiers in the SOEP should be consistent.
The checks mentioned above have been finialized after the data production of our most recent CD-Rom (up to wave U, 2004). If you are interested in using the corrected information you may apply the following statements (TXT, 9.92 KB). |
Feb. 18, 2005 | Probably only in STATA used with Windows 2000 some variables are diplayed in a curious way. More information in German. |
Dec. 10, 2004 | Since the distribution of SOEP data 1984-2003, some variables have been corrected or modified. |
19.12.2003 | POP - Variables in the data distribution within Germany
Provisional values for the generated variables for population membership (SPOP and SHPOP) have inadvertently been distributed. We will provide an update at the beginning of next year. The POP variables which rely on extrapolation factors have been calculated using the correct data and are therefore not affected. |
18.12.2003 | Data distribution within Germany
Due to an error in the setup program for the 1984-2002 data distribution for Stata and SPSS, the file "BIOJOB" is not automatically installed. SAS users are not affected.In order to gain access to the "BIOJOB" file through Stata or SPSS, it has to be installed manually using a program-specific command.
d:\data\gsoep\sta_100.exe -pass=******** biojob.* (Stata-Files), in order to install the respective statistical package. If you have any further problems or questions please ask Rainer Pischner. |
04.11.2003 | Data distribution within Germany
A LABEL bug in the file BIOPAREN on the German CD 1984-2002.In the file BIOPAREN we discovered a small value label bug. It emerged in the variables VNAT und MNAT. The label for value 2 has to be "andere Staatsangehörigkeit als deutsch" and not "türkisch". |
Unfortunately we have found a few more LABEL bugs in the English distribution the Person Files. The data is ok but incorrectly labeled. You can download code in STATA, SPSS and SAS which can be copied and run. Simply edit the pathname of where you installed the data, at the top of the code chunk. That will patch things up quickly. Sorry for any hassles caused. John Haisken-DeNew |
Unfortunately we have found a few more VAR LABEL bugs in the english distribution of QP (Person File 2000). The data is ok but incorrectly labeled (var labels). Attached is code in STATA, SPSS and SAS which can be copied and run. Simply edit the pathname of where you installed the data, at the top of the code chunk (AND at the bottom for SPSS only). That will patch things up quickly. Sorry for any hassles caused. ===================== STATA ==================== use c:\gsoep17\qp get file='c:\gsoep17\qp.sav'. libname soep 'c:\gsoep17'; |