Inproceedings,

Four Factors Affecting Missing Data Imputation

, , and .
Proceedings of the 35th International Conference on Scientific and Statistical Database Management, page 1–2. New York, NY, USA, Association for Computing Machinery, (Aug 27, 2023)
DOI: 10.1145/3603719.3604285

Abstract

Missing data is a common problem in datasets and impacts the reliability of data analysis. Numerous methods to impute (i.e., predict and replace) missing values have been proposed. The quality of these imputed values depends on factors like correlation, percentage of missingness, or the mechanism behind the missing value. Despite comparative studies on imputation methods, conditions for their effectiveness and safe application lack dedicated investigation. This research aims to systematically investigate the impact of four factors on imputation quality. We specifically investigate the extent to which (1) missing data mechanism, (2) variable distribution, (3) correlation, and (4) percentage of missingness affect the imputation quality of eight different machine-learning-based imputation methods. The evaluation will be done on both a synthetic dataset and a real-world dataset from voestalpine Stahl GmbH.

Tags

Users

  • @scch
  • @dblp

Comments and Reviews