A Large-Scale k-Nearest Neighbor Classification Algorithm Based on Neighbor Relationship Preservation

被引:6
|
作者
Song, Yunsheng [1 ]
Kong, Xiaohan [1 ]
Zhang, Chao [1 ]
机构
[1] Shandong Agr Univ, Coll Informat Sci & Engn, Tai An 271018, Shandong, Peoples R China
关键词
CONDENSATION; SELECTION;
D O I
10.1155/2022/7409171
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Owing to the absence of hypotheses of the underlying distributions of the data and the strong generation ability, the k-nearest neighbor (kNN) classification algorithm is widely used to face recognition, text classification, emotional analysis, and other fields. However, kNN needs to compute the similarity between the unlabeled instance and all the training instances during the prediction process; it is difficult to deal with large-scale data. To overcome this difficulty, an increasing number of acceleration algorithms based on data partition are proposed. However, they lack theoretical analysis about the effect of data partition on classification performance. This paper has made a theoretical analysis of the effect using empirical risk minimization and proposed a large-scale k-nearest neighbor classification algorithm based on neighbor relationship preservation. The process of searching the nearest neighbors is converted to a constrained optimization problem. Then, it gives the estimation of the difference on the objective function value under the optimal solution with data partition and without data partition. According to the obtained estimation, minimizing the similarity of the instances in the different divided subsets can largely reduce the effect of data partition. The minibatch k-means clustering algorithm is chosen to perform data partition for its effectiveness and efficiency. Finally, the nearest neighbors of the test instance are continuously searched from the set generated by successively merging the candidate subsets until they do not change anymore, where the candidate subsets are selected based on the similarity between the test instance and cluster centers. Experiment results on public datasets show that the proposed algorithm can largely keep the same nearest neighbors and no significant difference in classification accuracy as the original kNN classification algorithm and better results than two state-of-the-art algorithms.
引用
收藏
页数:11
相关论文
共 50 条
  • [41] Gravity-Matching Algorithm Based on K-Nearest Neighbor
    Gao, Shuaipeng
    Cai, Tijing
    Fang, Ke
    SENSORS, 2022, 22 (12)
  • [42] Protein Sequence Classification Based on N-Gram and K-Nearest Neighbor Algorithm
    Dongardive, Jyotshna
    Abraham, Siby
    COMPUTATIONAL INTELLIGENCE IN DATA MINING, CIDM, VOL 2, 2016, 411 : 163 - 171
  • [43] A Grid-based k-Nearest Neighbor Join for Large Scale Datasets on MapReduce
    Jang, Miyoung
    Shin, Young-Sung
    Chang, Jae-Woo
    2015 IEEE 17TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, 2015 IEEE 7TH INTERNATIONAL SYMPOSIUM ON CYBERSPACE SAFETY AND SECURITY, AND 2015 IEEE 12TH INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS (ICESS), 2015, : 888 - 891
  • [44] A k-nearest neighbor approach for chromosome shape classification
    Serbanescu, Mircea Sebastian
    ANNALS OF THE UNIVERSITY OF CRAIOVA-MATHEMATICS AND COMPUTER SCIENCE SERIES, 2010, 37 (03): : 142 - 146
  • [45] K-Nearest Neighbor Intervals Based AP Clustering Algorithm for Large Incomplete Data
    Lu, Cheng
    Song, Shiji
    Wu, Cheng
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2015, 2015
  • [46] A sequential weighted k-nearest neighbor classification method
    Zhu, Ming-Han
    Luo, Da-Yong
    Yi, Li-Qun
    Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2009, 37 (11): : 2584 - 2588
  • [47] IKNN: Informative K-nearest neighbor pattern classification
    Song, Yan
    Huang, Jian
    Zhou, Ding
    Zha, Hongyuan
    Giles, C. Lee
    KNOWLEDGE DISCOVERY IN DATABASES: PKDD 2007, PROCEEDINGS, 2007, 4702 : 248 - +
  • [48] Improving K-Nearest Neighbor Efficacy for FarsiText Classification
    Elahimanesh, Mohammad Hossein
    BehrouzMinaei-Bidgoli
    Malekinezhad, Hossein
    LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 1618 - 1621
  • [49] k-Nearest Neighbor Classification Using Dissimilarity Increments
    Aidos, Helena
    Fred, Ana
    IMAGE ANALYSIS AND RECOGNITION, PT I, 2012, 7324 : 27 - 33
  • [50] K-Nearest Neighbor Classification for Glass Identification Problem
    Aldayel, Mashael S.
    2012 INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND INDUSTRIAL INFORMATICS (ICCSII), 2012,