High-dimensional missing data imputation via undirected graphical model


연구 분야: Databases



학회: Statistics and Computing


초록

Multiple imputation is a practical approach in analyzing incomplete data, with multiple imputation by chained equations (MICE) being popularly used. MICE specifies a conditional distribution for each variable to be imputed, but estimating it is inherently a high-dimensional problem for large-scale data. Existing approaches propose to utilize regularized regression models, such as lasso. However, the estimation of them occurs iteratively across all incomplete variables, leading to a considerable increase in computational burden, as demonstrated in our simulation study. To overcome this computational bottleneck, we propose a novel method that estimates the conditional independence structure among variables before the imputation procedure. We extract such information from an undirected graphical model, leveraging the graphical lasso method based on the inverse probability weighting estimator. Our simulation study verifies the proposed method is way faster against the existing methods, while still maintaining comparable imputation performance.


Author Profile
Yoonah Lee

Department of Statistics Sungshin Women’s University 34 da-gil Bomun-ro Seoul 02844 Korea

Romania
Author Profile
Seongoh Park

School of Mathematics Statistics and Data Science Sungshin Women’s University 34 da-gil Bomun-ro Seoul 02844 Korea

Andorra

📄 논문 정보

발행 연도 2024년
인용수 3
출판 국가 Romania, Andorra
사이트 Springer
좋아요 수 0

연관 논문 목록 (28건)