Nearest neighbor selection for iteratively kNN imputation

被引:267
|
作者
Zhang, Shichao [1 ,2 ,3 ]
机构
[1] Guangxi Normal Univ, Coll Comp Sci & Informat Technol, Guilin, Peoples R China
[2] Chinese Acad Sci, Inst Comp Technol, Beijing, Peoples R China
[3] Univ Technol Sydney, Fac Engn & Informat Technol, QUIS, Sydney, NSW 2007, Australia
基金
澳大利亚研究理事会;
关键词
Missing data; k nearest neighbors; kNN imputation; MISSING VALUE ESTIMATION; CLASSIFICATION; PREDICTION; LIKELIHOOD; ALGORITHM; SYSTEMS; VALUES;
D O I
10.1016/j.jss.2012.05.073
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Existing kNN imputation methods for dealing with missing data are designed according to Minkowski distance or its variants, and have been shown to be generally efficient for numerical variables (features, or attributes). To deal with heterogeneous (i.e., mixed-attributes) data, we propose a novel kNN (k nearest neighbor) imputation method to iteratively imputing missing data, named GkNN (gray kNN) imputation. GkNN selects k nearest neighbors for each missing datum via calculating the gray distance between the missing datum and all the training data rather than traditional distance metric methods, such as Euclidean distance. Such a distance metric can deal with both numerical and categorical attributes. For achieving the better effectiveness, GkNN regards all the imputed instances (i.e., the missing data been imputed) as observed data, which with complete instances (instances without missing values) together to iteratively impute other missing data. We experimentally evaluate the proposed approach, and demonstrate that the gray distance is much better than the Minkowski distance at both capturing the proximity relationship (or nearness) of two instances and dealing with mixed attributes. Moreover, experimental results also show that the GkNN algorithm is much more efficient than existent kNN imputation methods. (c) 2012 Elsevier Inc. All rights reserved.
引用
收藏
页码:2541 / 2552
页数:12
相关论文
共 50 条
  • [41] Improved nearest neighbor classifiers by weighting and selection of predictors
    Gerhard Tutz
    Dominik Koch
    Statistics and Computing, 2016, 26 : 1039 - 1057
  • [42] Optimization Models for Feature Selection of Decomposed Nearest Neighbor
    Xiao, Cao
    Chaovalitwongse, Wanpracha Art
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2016, 46 (02): : 177 - 184
  • [43] Optimal selection of reference subset for nearest neighbor classification
    Zhang, Hong-Bin
    Sun, Guang-Yu
    Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2000, 28 (11): : 16 - 21
  • [44] An Instance Selection Algorithm Based on Reverse Nearest Neighbor
    Dai, Bi-Ru
    Hsu, Shu-Ming
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PT I: 15TH PACIFIC-ASIA CONFERENCE, PAKDD 2011, 2011, 6634 : 1 - 12
  • [45] Nearest Neighbor-based Instance Selection for Classification
    Yu, Guanghua
    Tian, Jin
    Li, Minqiang
    2016 12TH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (ICNC-FSKD), 2016, : 75 - 80
  • [46] Classification of Lung Nodules based on Transfer Learning with K-Nearest Neighbor (KNN)
    Saikial, Trishna
    Hansdahl, Malho
    Singh, Koushlendra Kumar
    Bajpai, Manish Kumar
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGING SYSTEMS AND TECHNIQUES (IST 2022), 2022,
  • [47] NNFSRR: Nearest Neighbor Feature Selection and Redundancy Removal Method for Nearest Neighbor Search in Microarray Gene Expression Data
    Bhartiya R.
    Prajapati G.L.
    EAI Endorsed Transactions on Pervasive Health and Technology, 2023, 9
  • [48] Nearest neighbor imputation using spatial-temporal correlations in wireless sensor networks
    Li, YuanYuan
    Parker, Lynne E.
    INFORMATION FUSION, 2014, 15 : 64 - 79
  • [49] NNAWA: A Granular Nearest Neighbor Imputation Technique Based on Alpha-Weighted Average
    Luukka, Pasi
    Stoklasa, Jan
    Kumbure, Mahinda Mailagaha
    2024 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, FUZZ-IEEE 2024, 2024,
  • [50] Handling Missing Strain (Rate) Curves Using K-Nearest Neighbor Imputation
    Tabassian, Mahdi
    Alessandrini, Martino
    Jasaityte, Ruta
    De Marchi, Luca
    Masetti, Guido
    D'hooge, Jan
    2016 IEEE INTERNATIONAL ULTRASONICS SYMPOSIUM (IUS), 2016,