Identifying missing data handling methods with text mining


연구 분야: Databases



학회: International Journal of Data Science and Analytics


초록

Missing data is an inevitable aspect of every empirical research. Researchers developed several techniques to handle missing data to avoid information loss and biases. Over the past 50 years, these methods have become more and more efficient and also more complex. Building on previous review studies, this paper aims to analyze what kind of missing data handling methods are used among various scientific disciplines. For the analysis, we used nearly 50.000 scientific articles published between 1999 and 2016. JSTOR provided the data in text format. We utilized a text-mining approach to extract the necessary information from our corpus. Our results show that the usage of advanced missing data handling methods, such as Multiple Imputation or Full Information Maximum Likelihood estimation, is steadily growing in the examination period. Additionally, simpler methods, like listwise and pairwise deletion, are still in widespread use.


Author Profile
Krisztián Boros

Graduate School of Economics Waseda University Totsukamachi 1-104 Shinjuku 169-8050 Tokyo Japan

Japan
Author Profile
Zoltán Kmetty

Centre for Social Sciences HUN-REN Tóth Kálmán str. 4 Budapest 1097 Budapest Hungary

Hungary

📄 논문 정보

발행 연도 2024년
인용수 3
출판 국가 Hungary, Japan
사이트 Springer
좋아요 수 0

연관 논문 목록 (172건)