Direkt zum Inhalt

Vortrag
Estimating the Impact of Alternative Multiple Imputation Methods on Longitudinal Wealth Data

Christian Westermeier, Markus M. Grabka


Evidenzbasierte Wirtschaftspolitik : Jahrestagung des Vereins für Socialpolitik 2014
Hamburg, 07.09.2014 - 10.09.2014


get_app Beitrag | PDF  283 KB

Abstract:
Statistical Analysis in surveys is often facing missing data. As case-wise deletion and single imputation prove to have undesired properties, multiple imputation remains as a measure to handle this problem. In a longitudinal study, where for some missing values past or future data points might be available, the question arises how to successfully transform this advantage into better imputation models. In a simulation study the authors compare six combinations of cross-sectional and longitudinal imputation strategies for German wealth panel data (SOEP wealth module). The authors create simulation data sets by blanking out observed data points: they induce item non response into the data by both missing at random (MAR) and two separate missing not at random (MNAR) mechanisms. We test the performance of multiple imputation using chained equations (MICE), an imputation procedure for panel data known as the row-and-columns method and a regression specification with correction for sample selection including a stochastic error term. The regression and MICE approaches serve as fallback methods when only cross-sectional data is available. Even though the regression approach omits certain stochastic components and estimators based on its result are likely to underestimate the uncertainty of the imputation procedure, it performs weak against the MICE set-up. The row-and-columns method, a univariate method, performs well considering both longitudinal and cross-sectional evaluation criteria. These results show that if the variables which ought to be imputed are assumed to exhibit high state dependency, univariate imputation techniques such as the row-and-columns imputation should not be dismissed beforehand.

Abstract

Statistical Analysis in surveys is often facing missing data. As case-wise deletion and single imputation prove to have undesired properties, multiple imputation remains as a measure to handle this problem. In a longitudinal study, where for some missing values past or future data points might be available, the question arises how to successfully transform this advantage into better imputation models. In a simulation study the authors compare six combinations of cross-sectional and longitudinal imputation strategies for German wealth panel data (SOEP wealth module). The authors create simulation data sets by blanking out observed data points: they induce item non response into the data by both missing at random (MAR) and two separate missing not at random (MNAR) mechanisms. We test the performance of multiple imputation using chained equations (MICE), an imputation procedure for panel data known as the row-and-columns method and a regression specification with correction for sample selection including a stochastic error term. The regression and MICE approaches serve as fallback methods when only cross-sectional data is available. Even though the regression approach omits certain stochastic components and estimators based on its result are likely to underestimate the uncertainty of the imputation procedure, it performs weak against the MICE set-up. The row-and-columns method, a univariate method, performs well considering both longitudinal and cross-sectional evaluation criteria. These results show that if the variables which ought to be imputed are assumed to exhibit high state dependency, univariate imputation techniques such as the row-and-columns imputation should not be dismissed beforehand.

Markus M. Grabka

Direktorium SOEP und kommissarische Bereichsleitung Wissenstransfer in der Infrastruktureinrichtung Sozio-oekonomisches Panel


DIW-Link
Array

keyboard_arrow_up