The 2019 SOEP User Survey, conducted from mid-December 2019 to early January 2020, marked both the start of a new decade and the ten-year anniversary of the SOEP User Survey. Every year, the SOEP User Survey gives data users the opportunity to tell us about their experiences working with the SOEP data. Our 2019 survey focused on three topics: the technical preconditions for data analysis, the quality of the SOEP data, and our Getting Started services. We are grateful to the 812 respondents, whose valuable input will help us to continue developing and improving the SOEP further.
In this section of the survey, we asked data users what technical problems they had experienced, if any, and what hardware and software they needed to be able to work well with the SOEP data. The majority of users had no problems opening the SOEP datasets, but some had difficulties, for instance, processing the numerous variables in our individual long-format dataset in Stata/IC (Figure 1).
Based on this feedback, we developed several recommendations for data users. We recommend the use of Stata/MP or Stata/SE on a computer with an internal memory of 16GB. Users can still work with the data in Stata/IC or on less powerful computers, but some modifications, such as the commands “describe using pl.dta” and “use pid syear plVARS using pl.dta”, allow users work effectively with even our largest datasets while placing low demands on their hard- and software.
In the second section of the survey, we asked users what they thought about various aspects of SOEP data quality. The results show strengths in the areas of reliability and punctuality. Users saw the greatest potential for improvement in the areas of documentation and user-friendliness. Based on this feedback, we have introduced improvements in these areas—for instance, in our Getting Started toolbox of services for new and returning data users.
On a Likert scale, the means are displayed as a blue line and the medians by status of researchers as red dots.
The results of our 2018 User Survey showed a need for improvements in user-friendliness of the data, so over the course of 2019, the SOEP continued developing Getting Started, a toolbox of services designed to facilitate data use for new and returning users. These services include Paneldata.org, SOEPcompanion, SOEPtutorials, and SOEPhelp. In our 2019 User Survey, we invited users to rate these services. We asked whether they knew of each service, whether they had ever used it, and if so, whether they used it regularly or just occasionally. We then asked whether they would recommend each service to others. We only included answers from respondents who had used a service at least once. The survey results show that a large majority of SOEP data users would recommend our Getting Started services to others.
We are grateful to all of the users who took part in our User Survey and look forward to the next ten years of working together with the SOEP research community.
In 2018, the SOEP User Survey entered its ninth year as an online survey. The annual survey invites users to give us their opinions, ideas, wishes, and critique, and to alert us to potential problems. This year, we shortened the questionnaire slightly to focus more on how researchers use and assess the different datasets and to give users more opportunities for feedback. We are grateful to the 797 respondents to our 2018 User Survey for their suggestions, which will help us to continue improving our data and services.
To stay abreast of the changing needs of SOEP data users and to create the best possible conditions for analyzing the SOEP data, we regularly ask what statistical software users work with. While Stata use had increased in popularity in previous years, it declined in 2018 (Figure 1, multiple answers allowed). Stata remains the most popular statistical software among SOEP users at 76 percent, followed by the programming language R at 31 percent. To respond to this trend, we will be providing a version of future data releases formatted for analysis in R.
Use of the Studies
We are not only interested in finding out what software SOEP users work with, but also in what they are analyzing. The question of which SOEP datasets users work with regularly has been part of our user survey for many years. The results for 2018 present a stable picture relative to previous years (Figure 2). In 2018, 36% of users reported using SOEP-Core on a regular basis, and over two thirds of SOEP users have worked with SOEP-Core at least once. Just under one third of users reported regular use of SOEPlong, a dataset that we provide in easier-to-use “long” form. Survey results showed that users are less familiar with SOEPlong than with SOEP-Core. Regular use of SOEP-IS, a sample designed for innovative research questions, is relatively low (3%). Around one third of all users were not aware of SOEP-IS.
Important Aspects of the SOEP
As in 2017, our 2018 User Survey asked users to rate the SOEP on specific quality criteria (Figure 3). Again using a seven-point Likert scale, users first rated how important each of the quality criteria was to them, and then evaluated the SOEP’s current performance in each area. The results show that users place the highest importance on understanding the process of data generation and on getting a quick idea of whether SOEP data would fit their research project. Both categories show a negative difference between expectations and realities. The SOEP is exceeding users’ expectations in the punctuality of data releases and in the e-mail information sent out to users about new SOEP studies and projects.
Strength and Weaknesses
Sometimes we see ourselves differently from how others see us. To be able to build on our strengths and improve on our weaknesses, we asked users to tell us the SOEP’s three greatest strengths and weaknesses in an open-answer question. We sorted the diverse answers into 16 thematic categories. We compared the number of respondents who considered each category a strength or weakness and compiled the overview in Figure 4. The SOEP’s three greatest strengths were in the diversity and number of themes and variables, the long duration of the study, and the data format. Data access is a less pronounced strength: users regard it as positive but see some potential for improvement. Significant weaknesses lie in the user-friendliness of the data and documentation. We addressed both points in the data release following the 2018 user survey: we now provide an integrated version of the data, and our new SOEPcompanion is online at http://companion.soep.de/index.html.
We are grateful to all of the respondents to our 2018 User Survey for their useful feedback and suggestions!
From November 20, 2017, to January 3, 2018, SOEP users had the opportunity to take part in our annual SOEP User Survey and contribute their opinions, requests, and ideas for the further development of the services provided by the SOEP. In addition to standard questions about our various services and infrastructural work, this year’s user survey also included questions focusing on optimizing user friendliness. We are grateful to the 757 users who participated in the survey for their valuable feedback and many suggestions.
Working with the SOEP data for the first time often poses a major challenge to our users. In order to increase user friendliness and address the specific problems users face, we wanted to start by clearly defining our SOEP user community. Around 63 percent of respondents worked with the SOEP for the first time in 2016 or 2017 and can therefore be considered “new users.” We refer to the approximately 37 percent of respondents who had used the SOEP data for the first time before 2016 as “old users.” Our new users are, on average, 31 years old, female (53 percent), and work in economics (62 percent) as doctoral students (35 percent) or research assistants (29 percent). Old users are approximately seven years older, and the majority are male (65 percent) and professors (33 percent) or research assistants (39 percent) in economics (43 percent) and sociology (42 percent).
This year, SOEP User Survey respondents were asked to rate the SOEP on a series of quality criteria (Figure 1) using a seven-point Likert scale. We then compared users’ expectations for each criterion with their ratings of the SOEP’s performance. The SOEP exceeded users’ expectations in areas such as punctuality of data release, information on new studies and projects, and the possibility to submit questions to SOEP-IS, and performed below users’ expectations in the area “understandable data generation process”.
To identify potential problems faced by first-time SOEP data users, this year’s survey asked respondents what subjects they would like to have covered in the SOEP’s instructional materials (Figure 2). Users were asked to think back to the first time they worked with the SOEP data and rate (on a seven-point Likert scale) how useful an instructional manual on specific topics would have been at that time, and whether the currently available instructional materials are sufficient. Respondents rated instructions on how to find the history of questions and variables (mean: 5.6) and understand the meaning of terms for datasets and variables (mean: 5.6) as extremely important. They reported having problems especially in “finding the history of questions and variables,” and felt that the available information (mean: 4.6) should be expanded. In the area “survey instruments and their contents,” the information provided by the SOEP met users’ expectations. Users rated detailed written descriptions with screenshots as their preferred form of instructional materials (Figure 3).
The SOEP offers an array of SOEPcampus events to make it easier to get started working with the SOEP. To tailor our workshops to old and new users’ needs, we asked User Survey respondents how important the various types of workshops are (Figure 4). When comparing the percentages of old and new users who rated a particular workshop as “very important,” we found differences in demands: introductory workshops on “techniques for preparing datasets for analysis” (difference: 18 percent) and “methods for analyzing longitudinal data (e.g., panel regression)” were much more important to new users. Among old users, there was a higher demand for an “introduction into the use of paneldata.org” (difference: -14%).
The extensive feedback from our User Survey is a valuable source of information that helps us to continually improve our work and our services. We are very grateful to all of the SOEP users who participated in our survey in 2017!
As we reported in the last SOEPnewsletter, a number of SOEP users were kind enough to participate in the user survey during the last two months of 2016. In addition to the classical questions on our services and infrastructure work, we were highly interested in finding out the level of user awareness about our various studies and how SOEP data are used. Approximately 30 percent of the 713 total respondents were new users who worked with the SOEP data for the first time in 2016. For the sixth year in a row, our survey was targeted to all SOEP data users and contracting parties.
Awareness level of the studies
This year, our questionnaire contained a larger group of topics on the awareness of and interest in our SOEP-based studies and their strengths and weaknesses. As a result, we were able to determine that around half of the respondents were not aware of the options that SOEP -IS offers, but found them interesting and would probably use them in the future (see Figures 1 and 2). This included the option of proposing questions and experiments for SOEP-IS, or evaluating other researchers’ questions and experiments after a suitable hiatus period. The survey also showed that many respondents are interested in the IAB-BAMF-SOEP Survey of Refugees in Germany and using the dataset available in November 2017 (see Figure 3).
Use of the data
With regard to the use of our data, we also wanted to find out which statistics programs researchers use and with which frequency (see Figure 4). Stata is now used by 75 percent of users and has established itself as the most frequently used program. And R is gaining in frequency of use.
Thanks to the very detailed feedback from our respondents, we have also received valuable suggestions on how to make our work and the services we offer even better. Their suggestions for improving documentation have provided us with the impetus to be clearer and more user-friendly concerning this. We are aware that the “landscape of SOEP studies” is filling up, and providing suitable aids is essential for beginning to work with SOEP or continuing to use it. For this reason, in addition to restructuring the paneldata.org metadata portal, we plan to completely revise the SOEPlong documentation. When integrating the IAB-BAMF-SOEP Survey of Refugees, we will update the existing SOEP data structure and provide additional separate variables generated purely for the migration. They will be documented in an understandable manner. Using our familiar channels (the SOEPnewsletter and SOEP website), we will keep you informed on the status of these activities.
We would again like to express our gratitude to all 2016 user survey respondents!
In winter 2015, 771 SOEP users again took part in the SOEP user survey. This year’s survey covered classic questions about SOEP service and infrastructure as well as the new topics of data sharing in academia and re-analysis of data. To ensure the highest possible participation in our survey, we sent the invitation to an integrated mailing list consisting of longtime SOEP users with a data distribution contract, new users who signed a subcontract for data use within the last year, users who download the SOEP data, and members of the SOEP mailing list. We are proud to report that we achieved the highest number of responses of any year since the start of our user survey. Participation increased 13% over 2014 (see Figure 1). We are very grateful to everyone who participated in the survey.
We do not know the characteristics of our entire user community. In the following we use the term “user community” to refer to those users who participated in our user survey.
The results show that in 2015, our user community was 41% women and 59% men: an 8 percentage point increase in female users and the highest number of female users since the beginning of the survey in 2004.
Research staff and post-doctoral students made up one third of all respondents to the user survey, while the percentage of professors has declined since last year. This has been accompanied by a decline in the use of SOEP data in teaching (from 69% in 2014 to 61% in 2015).
The research fields represented by SOEP users have not changed in a significant manner since the last user survey. The proportion of respondents from the field of economics has declined to 45% since the last survey. Around 41% of our users are from the social sciences or sociology.
In this year’s user survey, we wanted to find out our users’ preferences for data distribution. The increasing complexity of the SOEP sample means an increasing amount of effort to generate the data. To meet this challenge, the SOEP staff is constantly working to improve the process of data preparation and generation. We want to give you—our users—the opportunity to tell us your preferences so that we can meet your needs as well as possible. In the survey, respondents were asked to drag and drop the aspects of “advance data access”, “quality of data checking and testing” and “completeness of the data” into their own order of importance (see Figure 2). The results show that for our users, “advance data access” is less important than data quality checking or data completeness.
Based on this, we have concluded that we should put more weight on completeness and data checking procedures, even if this means delays in data provision.
We use our annual survey to evaluate the various services we provide. Respondents were asked to rate on a scale from 0 to 10 how satisfied they were with SOEP contract management, data, data downloads, and documentation. In all of these areas, the
overwhelming majority of responding users were very satisfied with our services. And in some areas, respondents rated us even higher this year than in 2014.
The importance of data documentation is also evident from the critiques and suggestions provided by respondents, which confirm the need to continue improving our work in this area. An important step in this direction has been taken with the introduction of our new metadata portal, paneldata.org. The difficulties entailed by learning a new way of
working are evident in Figure 3. Many of our users continue to use our old metadata portal, SOEPinfo, which continues to run parallel to paneldata.org. Almost half of all respondents were not yet aware of paneldata.org, at least not under this new name
that we introduced instead of SOEPinfo v.2. Thanks to our respondents’ extensive feedback, we have valuable ideas for facilitating the transition to paneldata.org.
We are working hard to optimize paneldata.org and to make it as user-friendly as possible. We encourage all users to take the leap and switch over, since paneldata.org contains documentation not only on SOEP-Core but also on the practical new SOEPlong, as well as the SOEP Innovation Sample (SOEP-IS) and other related longitudinal studies.
Thank you again to everyone who participated in our 2015 SOEP User Survey!
In fall 2014, a notable 662 SOEP users—to whom we are very grateful—took part again in our SOEP User Survey and gave us feedback on the range of services we currently provide. We are very pleased to be able to rely on a stable user community for our annual online survey. The number of users who completed the entire survey has risen continuously over the last four years (see Figure 1). Of the 662 users who expressed interest, 581 completed the survey, which means a dropout rate of around 12%, which was down slightly from the previous year (16%).
- A rising number of professors act as mulitpliers of the SOEP -
The results of the survey show a change in the composition of our user community with regard to occupational status and location of the institution. While the percentage of students dropped from 14.5 % in 2013 to 8.7% in 2014, the percentage of professors has increased substantially to 45.7% (previous year: 24.7%).
At the same time, the share of participants not personally working nor having worked with the SOEP data increases. Figure 2 illustrates the increase of professors using the data passively, taking more of a role in disseminating the use of the SOEP. Sixty-four per cent (64%) of professors respond with “yes” to the additional question of whether they supervise junior researchers with a thesis using SOEP data. We are pleased with this channel of enlarging our user community, although many of our questions in the user survey can only be answered with first-hand experience working with the SOEP data.
The survey always asks respondents to evaluate each of our individual service areas. As seen in Figure 3, users gave high marks for the data download service introduced in 2013. Users reported equally high levels of satisfaction with the quality of the data and contract management. In the area of documentation, however, respondents saw room for improvement. We are well aware of the importance of our data documentation and are working constantly to improve in this area. One example is the recently updated Desktop Companion. Since the SOEP is still expanding with its various Related Studies (SOEP-RS), increasing effort is required to produce detailed documentation. Integrating the FiD study, which ran through 2014, into SOEP Core poses one such challenge. In 2015, we will be focusing on adapting these data to guarantee consistent and user-friendly documentation of the SOEP data.
- SOEPinfo v.2 to provide better access to data documentation -
In addition, we plan to further establish the use of our new metadata portal, SOEPinfo v.2, in our user community. It was developed as part of the open-source project “DDI on Rails” and includes not only thorough documentation of SOEP Core from the previous online resource, SOEPinfo, but also a complete picture of the SOEPlong data. The user survey showed that just a few months after SOEPinfo v.2 was introduced, around one-third of all respondents had already worked with it (see Figure 4). This group of respondents gave the version they used an average of 7 out of 10 possible points. In four out of six subcategories—visual design, information content, quality of the generated syntax, and response speed—the average rating was 7 or above. Overall, users’ evaluations of the new SOEPinfo were around equally high as those in 2011 in the same categories.
Survey respondents who had not used the new SOEPinfo v.2 reported that they were continuing to use the old SOEPinfo mainly out of habit or because they did not see a need to switch. We are very curious to see how users will respond to the question of which data documentation sources they use in the next user survey. Until then, we would like to encourage all our users to take advantage of this new service SOEPinfo v.2 and especially of the opportunity to provide us with your feedback.
The research project was carried out as an online survey using the web-based instrument LimeSurvey©. In November 2013, the online survey was conducted under strict adherence to data protection regulations. Among our invited SOEP contract holders, other users also had the option of taking part in the survey outside the framework of the LimeSurvey© by clicking on a link on the DIW Berlin webpage. A total of 585 anonymized responses were obtained for analysis.
Under the age of 30 there are more female than male data users. Above the age of 30, this statistic is reversed. In 2013, approximately 61% of all users were male.
Economists and social scientists have established themselves over time as the main groups of researchers in the SOEP and in 2013 made up over 80% of all data users.
¹ Includes public health, other social sciences, and in 2004 also psychology.
² Also includes information science.
If you are interested in more results, please see our poster (PDF, 0.75 MB).
A total of 574 users participated in the survey conducted in spring 2012, corresponding to a response rate of 21%. Most of our users continue to come from an economics (49%) or sociology (36%) background.
Because the SOEP is a longitudinal study, it was particularly important for us to learn more about whether you also use this property of the data for your analyses. The results showed that as few as 19% of respondents only use the cross-sectional aspect of our data and 22% make use solely of the longitudinal component. However, the majority (59%) use both the cross-sectional and longitudinal analysis potential of SOEP data.
A clear shift in the use of statistical software is evident. In 2004, SPSS was the most frequently used software (66%), followed by Stata (28%) and SAS (5%). In 2012, Stata ranks highest (55%), SPSS second (26%), and the open source software R has replaced SAS in third place with 9%.
To get a better picture of how SOEP users feel about the various services we provide, including data quality, data access, and documentation, we carry out regular surveys of users in Germany and abroad. Our main objective in the 2011 User Survey, which was conducted last summer, was to obtain feedback and suggestions for further improvements.
We sent out 1,996 e-mails to SOEP contract and sub-contract holders, and received answers from 443 users (22.2 percent). This figure corresponds fairly precisely to the number of "active" SOEP users who requested and received a data DVD in 2010 (N = 420). We thank all our users who responded to our invitation by completing the questionnaire.
As in previous years, the majority of this year's respondents came from the fields of economics (50%) and sociology (33 percent), followed by psychology (6 percent), statistics (4 percent), and political science (2 percent). The remaining 6 percent work in medicine, education, and geography. Most respondents work in Germany (70 percent) and the European Union (20 percent). Six percent of respondents work in North America and 4 percent in other parts of the world. Overall, users reported a high level of satisfaction with SOEP service: the reported overall mean satisfaction was 8.3 percent, satisfaction with data access was 8.6 percent, and satisfaction with documentation
was 7.9 percent (possible values ranging from 0 to 10). Only five respondents reported dissatisfaction (values between 0 and 4).
The results on data use show that more than 80 percent of respondents are using the longitudinal component of the data. This is good news for us, since it confirms that we are on the right track with our new data format, SOEPlong, which promises to make work with the SOEP data easier for many users. SOEPlong significantly reduces the number of
datasets by consolidating all those that are similar, and solves the problem of variable names differing from one wave to the next. Despite the fact that it is still in the beta stage, SOEPlong is already being used by 20 percent of user survey respondents. In this year's data release, we are already providing the second, improved beta version of SOEPlong.
As ever, we would be grateful for your feedback and suggestions.
The survey results on the use of SOEP data in teaching also proved very interesting. Although 68 percent of respondents teach at the university level, only 17 percent of them are using the special teaching version of the SOEP data. In fact, only 42 percent of respondents active in teaching were aware of the existence of the special teaching data set. In the future, we plan to provide users with more information about the possibilities of using this special SOEP dataset in teaching.
The User Survey provided useful feedback on SOEPinfo as well: 13 percent of respondents were unfamiliar with SOEPinfo. To rectify this, we plan to give SOEPinfo a more prominent place on our homepage and to further improve the possibilities it offers. One goal is to incorporate metadata information on the SOEPlong data format into a webbased metadata information system.
The 2011 User Survey showed a significant change in the software used with the SOEP data since the last user survey in 2004 (see Fig. 1). Most respondents are now using Stata, which has taken the lead over SPSS. The open-source software R is used by 8 percent of respondents. Relatively few users are working with Mplus (3%), SAS (3 %), or TDA (2%).
Stata users were also asked which version of Stata they are using. The results showed that three-quarters of Stata users work with Stata/SE or Stata/MP. A substantially smaller group is using the limited intercooled version, Stata/IC, which allows only a limited number of usable variables within a data set and thus is unable to open the entire individual dataset “PL” in SOEPlong (note: it is possible, however, to load the variables in Stata/IC needed by choosing the variables directly with the command “use”: “use HID PID varlist using pl.dta”).
The importance of our regular communication with you, the SOEP users, is underscored by the fact that two errors were pointed out to us in responses to the survey. We have corrected these in the new SOEP.v27 data release (one error in variable correspondence and one in the English labels).
We are very grateful to everyone who participated and will do our best to put your very useful suggestions into practice.