Direkt zum Inhalt

Better together? Regression analysis of complex survey data after ex-post harmonization

Diskussionspapiere extern

Anna-Carolina Haensch, Bernd Weiß

2020,
(SocArXiv Papers)

Abstract

An increasing number of researchers pool, harmonize, and analyze survey data from different survey providers for their research questions. They aim to study heterogeneity between groups over a long period or examine smaller subgroups; research questions that can be impossible to answer with a single survey. This combination or pooling of data is known as individual person data (IPD) meta-analysis in medicine and psychology; in sociology, it is understood as part of ex-post survey harmonization (Granda et al 2010). However, in medicine or psychology, most original studies focus on treatment or intervention effect and apply experimental research designs to come to causal conclusions. In contrast, many sociological or economic studies are nonexperimental. In comparison to experimental data, survey-based data is subject to complex sampling and nonresponse. Ignoring the complex sampling design can lead to biased population inferences not only in population means and shares but also in regression coefficients, widely used in the social sciences (DuMouchel and Duncan 1983 and Solon et al. 2013). To account for complex sampling schemes or non-ignorable unit nonresponse, survey-based data often comes with survey weights. But how to use survey weights after pooling different surveys? We will build upon the work done by DuMouchel and Duncan (1983) and Solon et al. (2013) for survey-weighted regression analysis with a single data set. Through Monte Carlo (MC) simulations, we will show that endogenous sampling and heterogeneity of effects models require survey weighting to receive approximately unbiased estimates after ex-post survey harmonization. Second, we focus on a list of methodological questions: Do survey-weighted one-stage and two-stage (meta-)analytical approaches perform differently? Is it possible to include random effects, especially if we have to assume study heterogeneity? Another challenging methodological question is the inclusion of random effects in a one-stage analysis. Our simulations show that two-stage analysis will be biased if the weights' variation is high, whereas one-stage analysis remains unbiased. We also show that the inclusion of random effects in a one-stage analysis is challenging but doable, i.e., weights must be transformed in most cases. Apart from the MC simulations, we also show the difference between two-stage and one-stage approaches with real-world data from same-sex couples in Germany.



Keywords: survey weight, meta-analysis, survey harmonization, multi-level
Externer Link:
https://osf.io/preprints/socarxiv/edm3v/download

DOI:
https://doi.org/10.31235/osf.io/edm3v

keyboard_arrow_up