Nearest neighbor selection for iteratively kNN imputation

被引:266
|
作者
Zhang, Shichao [1 ,2 ,3 ]
机构
[1] Guangxi Normal Univ, Coll Comp Sci & Informat Technol, Guilin, Peoples R China
[2] Chinese Acad Sci, Inst Comp Technol, Beijing, Peoples R China
[3] Univ Technol Sydney, Fac Engn & Informat Technol, QUIS, Sydney, NSW 2007, Australia
基金
澳大利亚研究理事会;
关键词
Missing data; k nearest neighbors; kNN imputation; MISSING VALUE ESTIMATION; CLASSIFICATION; PREDICTION; LIKELIHOOD; ALGORITHM; SYSTEMS; VALUES;
D O I
10.1016/j.jss.2012.05.073
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Existing kNN imputation methods for dealing with missing data are designed according to Minkowski distance or its variants, and have been shown to be generally efficient for numerical variables (features, or attributes). To deal with heterogeneous (i.e., mixed-attributes) data, we propose a novel kNN (k nearest neighbor) imputation method to iteratively imputing missing data, named GkNN (gray kNN) imputation. GkNN selects k nearest neighbors for each missing datum via calculating the gray distance between the missing datum and all the training data rather than traditional distance metric methods, such as Euclidean distance. Such a distance metric can deal with both numerical and categorical attributes. For achieving the better effectiveness, GkNN regards all the imputed instances (i.e., the missing data been imputed) as observed data, which with complete instances (instances without missing values) together to iteratively impute other missing data. We experimentally evaluate the proposed approach, and demonstrate that the gray distance is much better than the Minkowski distance at both capturing the proximity relationship (or nearness) of two instances and dealing with mixed attributes. Moreover, experimental results also show that the GkNN algorithm is much more efficient than existent kNN imputation methods. (c) 2012 Elsevier Inc. All rights reserved.
引用
收藏
页码:2541 / 2552
页数:12
相关论文
共 50 条
  • [21] Sleep Staging Using Photoplethysmography Signal and kNN Nearest Neighbor Algorithm
    Tuna, Serhat
    Bozkurt, Mehmet Recep
    Ucar, Muhammed Kursad
    Bilgin, Cahit
    2016 24TH SIGNAL PROCESSING AND COMMUNICATION APPLICATION CONFERENCE (SIU), 2016, : 1373 - 1376
  • [22] An Improved ML-kNN Algorithm by Fusing Nearest Neighbor Classification
    Zeng, Yong
    Fu, Hao-ming
    Zhang, Yu-ping
    Zhao, Xi-ya
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND COMPUTER SCIENCE (AICS 2016), 2016, : 193 - 198
  • [23] Incomplete-case nearest neighbor imputation in software measurement data
    Van Hulse, Jason
    Khoshgoftaar, Taghi M.
    INFORMATION SCIENCES, 2014, 259 : 596 - 610
  • [24] Incomplete-case nearest neighbor imputation in software measurement data
    Van Hulse, Jason
    Khoshgoftaar, Taghi M.
    IRI 2007: PROCEEDINGS OF THE 2007 IEEE INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION, 2007, : 630 - +
  • [25] Differentially Private k-Nearest Neighbor Missing Data Imputation
    Clifton, Chris
    Hanson, Eric J.
    Merrill, Keith
    Merrill, Shawn
    ACM TRANSACTIONS ON PRIVACY AND SECURITY, 2022, 25 (03)
  • [26] Feature selection and weighting by nearest neighbor ensembles
    Gertheiss, Jan
    Tutz, Gerhard
    CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2009, 99 (01) : 30 - 38
  • [27] Selection of a Metric for the Nearest Neighbor Entropy Estimators
    Timofeev E.
    Journal of Mathematical Sciences, 2014, 203 (6) : 892 - 906
  • [28] Prototype selection to improve monotonic nearest neighbor
    Cano, Jose-Ramon
    Aljohani, Naif R.
    Abbasi, Rabeeh Ayaz
    Alowidbi, Jalal S.
    Garcia, Salvador
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2017, 60 : 128 - 135
  • [29] Hash Bit Selection for Nearest Neighbor Search
    Liu, Xianglong
    He, Junfeng
    Chang, Shih-Fu
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2017, 26 (11) : 5367 - 5380
  • [30] Gene Selection by Mutual Nearest Neighbor Approach
    Shashirekha, H. L.
    Wani, Agar Hussain
    2015 INTERNATIONAL CONFERENCE ON EMERGING RESEARCH IN ELECTRONICS, COMPUTER SCIENCE AND TECHNOLOGY (ICERECT), 2015, : 398 - 402