Nearest neighbor selection for iteratively kNN imputation

被引：266

作者：

Zhang, Shichao ^{[1
,2
,3
]}

机构：

[1] Guangxi Normal Univ, Coll Comp Sci & Informat Technol, Guilin, Peoples R China

[2] Chinese Acad Sci, Inst Comp Technol, Beijing, Peoples R China

[3] Univ Technol Sydney, Fac Engn & Informat Technol, QUIS, Sydney, NSW 2007, Australia

来源：

JOURNAL OF SYSTEMS AND SOFTWARE | 2012年 / 85卷 / 11期

基金：

澳大利亚研究理事会;

关键词：

Missing data; k nearest neighbors; kNN imputation; MISSING VALUE ESTIMATION; CLASSIFICATION; PREDICTION; LIKELIHOOD; ALGORITHM; SYSTEMS; VALUES;

D O I：

10.1016/j.jss.2012.05.073

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Existing kNN imputation methods for dealing with missing data are designed according to Minkowski distance or its variants, and have been shown to be generally efficient for numerical variables (features, or attributes). To deal with heterogeneous (i.e., mixed-attributes) data, we propose a novel kNN (k nearest neighbor) imputation method to iteratively imputing missing data, named GkNN (gray kNN) imputation. GkNN selects k nearest neighbors for each missing datum via calculating the gray distance between the missing datum and all the training data rather than traditional distance metric methods, such as Euclidean distance. Such a distance metric can deal with both numerical and categorical attributes. For achieving the better effectiveness, GkNN regards all the imputed instances (i.e., the missing data been imputed) as observed data, which with complete instances (instances without missing values) together to iteratively impute other missing data. We experimentally evaluate the proposed approach, and demonstrate that the gray distance is much better than the Minkowski distance at both capturing the proximity relationship (or nearness) of two instances and dealing with mixed attributes. Moreover, experimental results also show that the GkNN algorithm is much more efficient than existent kNN imputation methods. (c) 2012 Elsevier Inc. All rights reserved.

引用

页码：2541 / 2552

页数：12

共 50 条

[1] A new two-layer nearest neighbor selection method for kNN classifier
Wang, Yikun
Pan, Zhibin
Dong, Jing
KNOWLEDGE-BASED SYSTEMS, 2022, 235
[2] Predicting the number of nearest neighbor for kNN classifier
Li, Yanying
Yang, Youlong
Che, Jinxing
Zhang, Long
IAENG International Journal of Computer Science, 2019, 46 (04) : 1 - 8
[3] Nearest neighbor imputation algorithms: a critical evaluation
Lorenzo Beretta
Alessandro Santaniello
BMC Medical Informatics and Decision Making, 16
[4] Multiple imputation using nearest neighbor methods
Faisal, Shahla
Tutz, Gerhard
INFORMATION SCIENCES, 2021, 570 : 500 - 516
[5] Nonparametric Variance Estimation for Nearest Neighbor Imputation
Shao, Jun
JOURNAL OF OFFICIAL STATISTICS, 2009, 25 (01) : 55 - 62
[6] Nearest neighbor imputation algorithms: a critical evaluation
Beretta, Lorenzo
Santaniello, Alessandro
BMC MEDICAL INFORMATICS AND DECISION MAKING, 2016, 16
[7] Multiple imputation using nearest neighbor methods
Faisal, Shahla
Tutz, Gerhard
Faisal, Shahla (shahla.ramzan@stat.uni-muenchen.de), 1600, Elsevier Inc. (570): : 500 - 516
[8] Nearest neighbor imputation for categorical data by weighting of attributes
Faisal, Shahla
Tutz, Gerhard
INFORMATION SCIENCES, 2022, 592 : 306 - 319
[9] Jackknife variance estimation for nearest-neighbor imputation
Chen, JH
Shao, J
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2001, 96 (453) : 260 - 269
[10] Iteratively Multiple Projections Optimization for Product Quantization in Nearest Neighbor Search
Li, Jin
Lan, Xuguang
Zheng, Nanning
2017 IEEE INTERNATIONAL CONFERENCE ON BIG KNOWLEDGE (IEEE ICBK 2017), 2017, : 65 - 71

← 1 2 3 4 5 →