missing data;
data mining;
lexicographic order;
nonparametric;
imputation;
tree-based models;
D O I:
暂无
中图分类号:
O21 [概率论与数理统计];
C8 [统计学];
学科分类号:
020208 ;
070103 ;
0714 ;
摘要:
Conditional mean imputation is a common way to deal with missing data. Although very simple to implement, the method might suffer from model misspecification and it results unsatisfactory for non linear data. We propose the iterative use of tree based models for missing data imputation in large data bases. The proposed procedure uses lexicographic order to rank missing values that occur in different variables and deals with these incrementally, i.e, augmenting the data by the previously filled in records according to the defined order.