Multiple Imputation of Binary Multilevel Missing Not at Random Data

Referierte Aufsätze Web of Science

Angelina Hammon, Sabine Zinn

In: Journal of the Royal Statistical Society / Series C 69 (2020), 3, S. 547–564

Abstract

We introduce a selection model‐based multilevel imputation approach to be used within the fully conditional specification framework for multiple imputation. Concretely, we apply a censored bivariate probit model to describe binary variables assumed to be missing not at random. The first equation of the model defines the regression model for the missing data mechanism. The second equation specifies the regression model of the variable to be imputed. The non‐random selection of the binary data is mapped by correlations between the error terms of the two regression models. Hierarchical data structures are modelled by random intercepts in both equations. To fit the novel imputation model we use maximum likelihood and adaptive Gauss–Hermite quadrature. A comprehensive simulation study shows the overall performance of the approach. We test its usefulness for empirical research by applying it to a common problem in social scientific research: the emergence of educational aspirations. Our software is designed to be used in the R package mice.

Sabine Zinn

SOEP Director German Socio-Economic Panel study

Topics: Survey methodology and data science

Keywords: Fully conditional speciﬁcation, Missingness not at random, Multilevel data, Multiple imputation, Selection model
DOI:
https://doi.org/10.1111/rssc.12401

Frei zugängliche Version: (econstor)
http://hdl.handle.net/10419/222432

Supplementary material to “Multiple imputation of binary multilevel missing not at random data"
https://rss.onlinelibrary.wiley.com/action/downloadSupplement?doi=10.1111%2Frssc.12401&file=rssc12401-sup-0001-Supinfo.pdf

Abteilungen und SOEP

Forschungsgruppen

Prognose und Projekte

Aktuelles

Über uns

SOEP-Daten

Forschung

Abstract