A Large-Scale k-Nearest Neighbor Classification Algorithm Based on Neighbor Relationship Preservation

被引:6
|
作者
Song, Yunsheng [1 ]
Kong, Xiaohan [1 ]
Zhang, Chao [1 ]
机构
[1] Shandong Agr Univ, Coll Informat Sci & Engn, Tai An 271018, Shandong, Peoples R China
关键词
CONDENSATION; SELECTION;
D O I
10.1155/2022/7409171
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Owing to the absence of hypotheses of the underlying distributions of the data and the strong generation ability, the k-nearest neighbor (kNN) classification algorithm is widely used to face recognition, text classification, emotional analysis, and other fields. However, kNN needs to compute the similarity between the unlabeled instance and all the training instances during the prediction process; it is difficult to deal with large-scale data. To overcome this difficulty, an increasing number of acceleration algorithms based on data partition are proposed. However, they lack theoretical analysis about the effect of data partition on classification performance. This paper has made a theoretical analysis of the effect using empirical risk minimization and proposed a large-scale k-nearest neighbor classification algorithm based on neighbor relationship preservation. The process of searching the nearest neighbors is converted to a constrained optimization problem. Then, it gives the estimation of the difference on the objective function value under the optimal solution with data partition and without data partition. According to the obtained estimation, minimizing the similarity of the instances in the different divided subsets can largely reduce the effect of data partition. The minibatch k-means clustering algorithm is chosen to perform data partition for its effectiveness and efficiency. Finally, the nearest neighbors of the test instance are continuously searched from the set generated by successively merging the candidate subsets until they do not change anymore, where the candidate subsets are selected based on the similarity between the test instance and cluster centers. Experiment results on public datasets show that the proposed algorithm can largely keep the same nearest neighbors and no significant difference in classification accuracy as the original kNN classification algorithm and better results than two state-of-the-art algorithms.
引用
收藏
页数:11
相关论文
共 50 条
  • [21] Feature Extraction based Text Classification using K-Nearest Neighbor Algorithm
    Azam, Muhammad
    Ahmed, Tanvir
    Sabah, Fahad
    Hussain, Muhammad Iftikhar
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2018, 18 (12): : 95 - 101
  • [22] Joint Evidential K-Nearest Neighbor Classification
    Gong, Chaoyu
    Li, Yongbin
    Liu, Yong
    Wang, Pei-hong
    You, Yang
    2022 IEEE 38TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2022), 2022, : 2113 - 2126
  • [23] Multiview Adaptive K-Nearest Neighbor Classification
    School of Science, East China Jiaotong University, Nanchang
    330013, China
    不详
    330013, China
    不详
    IEEE. Trans. Artif. Intell., 2024, 3 (1221-1234): : 1221 - 1234
  • [24] Rates of Convergence for Large-scale Nearest Neighbor Classification
    Qiao, Xingye
    Duan, Jiexin
    Cheng, Guang
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [25] Privacy preserving K-nearest neighbor classification
    Zhan, Justin
    Chang, Li Wu
    Matwin, Stan
    International Journal of Network Security, 2005, 1 (01) : 46 - 51
  • [26] FUZZY K-NEAREST NEIGHBOR ALGORITHM.
    Keller, James M.
    Gray, Michael R.
    Givens, James A.
    IEEE Transactions on Systems, Man and Cybernetics, 1985, SMC-15 (04): : 580 - 585
  • [27] Optimization of the Neighbor Parameter of k-Nearest Neighbor Algorithm for Collaborative Filtering
    Vaghela, Vimalkumar B.
    Pathak, Himalay H.
    PROCEEDINGS OF INTERNATIONAL CONFERENCE ON COMMUNICATION AND NETWORKS, 2017, 508 : 87 - 93
  • [28] Protein kinase inhibitors’ classification using K-Nearest neighbor algorithm
    Arian, Roya
    Hariri, Amirali
    Mehridehnavi, Alireza
    Fassihi, Afshin
    Ghasemi, Fahimeh
    Computational Biology and Chemistry, 2020, 86
  • [29] wSparse Coefficient-Based k-Nearest Neighbor Classification
    Ma, Hongxing
    Gou, Jianping
    Wang, Xili
    Ke, Jia
    Zeng, Shaoning
    IEEE ACCESS, 2017, 5 : 16618 - 16634
  • [30] Effective Classification of EEG Signals using K-Nearest Neighbor Algorithm
    Awan, Umer I.
    Rajput, U. H.
    Syed, Ghazaal
    Iqbal, Rimsha
    Sabat, Ifra
    Mansoor, M.
    PROCEEDINGS OF 14TH INTERNATIONAL CONFERENCE ON FRONTIERS OF INFORMATION TECHNOLOGY PROCEEDINGS - FIT 2016, 2016, : 120 - 124