Robust distance-based imputation techniques.
In this work we explore distance-based methodologies for data imputation in complex data sets of mixed-type data. Our proposal is based on the use of robust distances, calculated with the dbrobust R package, which allows combining numerical and categorical variables while reducing the influence of outliers. In particular, we analyze several real data sets with varying percentages of missing data and evaluate the efficiency and computing time of our proposal and some competitors. The results show that robust methods offer efficient missing value imputation.
Keywords: data imputation dbrobust mixed-type data robust distances