Vortrag
Estimating the Impact of Alternative Multiple Imputation Methods on Longitudinal Wealth Data

Christian Westermeier, Markus M. Grabka

29th Annual Congress of the European Economic Association : EEA 2014
Toulouse, Frankreich, 25.08.2014 - 29.08.2014

Abstract:
Statistical Analysis in surveys is often facing missing data. As case-wise deletion and single imputation prove to have undesired properties, multiple imputation remains as a measure to handle this problem. In a longitudinal study, where for some missing values past or future data points might be available, the question arises how to successfully transform this advantage into better imputation models. In a simulation study the authors compare six combinations of cross-sectional and longitudinal imputation strategies for German wealth panel data (SOEP wealth module). The authors create simulation data sets by blanking out observed data points: they induce item non response into the data by both missing at random (MAR) and two separate missing not at random (MNAR) mechanisms. We test the performance of multiple imputation using chained equations (MICE), an imputation procedure for panel data known as the row-and-columns method and a regression specification with correction for sample selection including a stochastic error term. The regression and MICE approaches serve as fallback methods when only cross-sectional data is available. Even though the regression approach omits certain stochastic components and estimators based on its result are likely to underestimate the uncertainty of the imputation procedure, it performs weak against the MICE set-up. The row-and-columns method, a univariate method, performs well considering both longitudinal and cross-sectional evaluation criteria. These results show that if the variables which ought to be imputed are assumed to exhibit high state dependency, univariate imputation techniques such as the row-and-columns imputation should not be dismissed beforehand.

Abstract

Statistical Analysis in surveys is often facing missing data. As case-wise deletion and single imputation prove to have undesired properties, multiple imputation remains as a measure to handle this problem. In a longitudinal study, where for some missing values past or future data points might be available, the question arises how to successfully transform this advantage into better imputation models. In a simulation study the authors compare six combinations of cross-sectional and longitudinal imputation strategies for German wealth panel data (SOEP wealth module). The authors create simulation data sets by blanking out observed data points: they induce item non response into the data by both missing at random (MAR) and two separate missing not at random (MNAR) mechanisms. We test the performance of multiple imputation using chained equations (MICE), an imputation procedure for panel data known as the row-and-columns method and a regression specification with correction for sample selection including a stochastic error term. The regression and MICE approaches serve as fallback methods when only cross-sectional data is available. Even though the regression approach omits certain stochastic components and estimators based on its result are likely to underestimate the uncertainty of the imputation procedure, it performs weak against the MICE set-up. The row-and-columns method, a univariate method, performs well considering both longitudinal and cross-sectional evaluation criteria. These results show that if the variables which ought to be imputed are assumed to exhibit high state dependency, univariate imputation techniques such as the row-and-columns imputation should not be dismissed beforehand.

Markus M. Grabka

Wissenschaftler Sozio-oekonomisches Panel

Themen: Surveymethodologie und Data-Science

DIW-Link
Array

Abteilungen und SOEP

Forschungsgruppen

Prognose und Projekte

Aktuelles

Über uns

SOEP-Daten

Forschung

Abstract