Nearest neighbor selection for iteratively kNN imputation

被引:266
|
作者
Zhang, Shichao [1 ,2 ,3 ]
机构
[1] Guangxi Normal Univ, Coll Comp Sci & Informat Technol, Guilin, Peoples R China
[2] Chinese Acad Sci, Inst Comp Technol, Beijing, Peoples R China
[3] Univ Technol Sydney, Fac Engn & Informat Technol, QUIS, Sydney, NSW 2007, Australia
基金
澳大利亚研究理事会;
关键词
Missing data; k nearest neighbors; kNN imputation; MISSING VALUE ESTIMATION; CLASSIFICATION; PREDICTION; LIKELIHOOD; ALGORITHM; SYSTEMS; VALUES;
D O I
10.1016/j.jss.2012.05.073
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Existing kNN imputation methods for dealing with missing data are designed according to Minkowski distance or its variants, and have been shown to be generally efficient for numerical variables (features, or attributes). To deal with heterogeneous (i.e., mixed-attributes) data, we propose a novel kNN (k nearest neighbor) imputation method to iteratively imputing missing data, named GkNN (gray kNN) imputation. GkNN selects k nearest neighbors for each missing datum via calculating the gray distance between the missing datum and all the training data rather than traditional distance metric methods, such as Euclidean distance. Such a distance metric can deal with both numerical and categorical attributes. For achieving the better effectiveness, GkNN regards all the imputed instances (i.e., the missing data been imputed) as observed data, which with complete instances (instances without missing values) together to iteratively impute other missing data. We experimentally evaluate the proposed approach, and demonstrate that the gray distance is much better than the Minkowski distance at both capturing the proximity relationship (or nearness) of two instances and dealing with mixed attributes. Moreover, experimental results also show that the GkNN algorithm is much more efficient than existent kNN imputation methods. (c) 2012 Elsevier Inc. All rights reserved.
引用
收藏
页码:2541 / 2552
页数:12
相关论文
共 50 条
  • [1] A new two-layer nearest neighbor selection method for kNN classifier
    Wang, Yikun
    Pan, Zhibin
    Dong, Jing
    KNOWLEDGE-BASED SYSTEMS, 2022, 235
  • [2] Predicting the number of nearest neighbor for kNN classifier
    Li, Yanying
    Yang, Youlong
    Che, Jinxing
    Zhang, Long
    IAENG International Journal of Computer Science, 2019, 46 (04) : 1 - 8
  • [3] Nearest neighbor imputation algorithms: a critical evaluation
    Lorenzo Beretta
    Alessandro Santaniello
    BMC Medical Informatics and Decision Making, 16
  • [4] Multiple imputation using nearest neighbor methods
    Faisal, Shahla
    Tutz, Gerhard
    INFORMATION SCIENCES, 2021, 570 : 500 - 516
  • [5] Nonparametric Variance Estimation for Nearest Neighbor Imputation
    Shao, Jun
    JOURNAL OF OFFICIAL STATISTICS, 2009, 25 (01) : 55 - 62
  • [6] Nearest neighbor imputation algorithms: a critical evaluation
    Beretta, Lorenzo
    Santaniello, Alessandro
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2016, 16
  • [7] Multiple imputation using nearest neighbor methods
    Faisal, Shahla
    Tutz, Gerhard
    Faisal, Shahla (shahla.ramzan@stat.uni-muenchen.de), 1600, Elsevier Inc. (570): : 500 - 516
  • [8] Nearest neighbor imputation for categorical data by weighting of attributes
    Faisal, Shahla
    Tutz, Gerhard
    INFORMATION SCIENCES, 2022, 592 : 306 - 319
  • [9] Jackknife variance estimation for nearest-neighbor imputation
    Chen, JH
    Shao, J
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2001, 96 (453) : 260 - 269
  • [10] Iteratively Multiple Projections Optimization for Product Quantization in Nearest Neighbor Search
    Li, Jin
    Lan, Xuguang
    Zheng, Nanning
    2017 IEEE INTERNATIONAL CONFERENCE ON BIG KNOWLEDGE (IEEE ICBK 2017), 2017, : 65 - 71