A depth-based nearest neighbor algorithm for high-dimensional data classification

被引:0
|
作者
Harikumar S. [1 ]
Aravindakshan Savithri A. [1 ]
Kaimal R. [1 ]
机构
[1] Department of Computer Science and Engineering, Amrita Vishwa Vidyapeetham, Amritapuri
关键词
Classification; Data-depth; Information gain; Nearest neighbor; Subspace-clustering;
D O I
10.3906/ELK-1807-163
中图分类号
学科分类号
摘要
Nearest neighbor algorithms like k-nearest neighbors (kNN) are fundamental supervised learning techniques to classify a query instance based on class labels of its neighbors. However, quite often, huge volumes of datasets are not fully labeled and the unknown probability distribution of the instances may be uneven. Moreover, kNN suffers from challenges like curse of dimensionality, setting the optimal number of neighbors, and scalability for high-dimensional data. To overcome these challenges, we propose an improvised approach of classification via depth representation of subspace clusters formed from high-dimensional data. We offer a consistent and principled approach to dynamically choose the nearest neighbors for classification of a query point by i) identifying structures and distributions of data; ii) extracting relevant features, and iii) deriving an optimum value of k depending on the structure of data by representing data using data depth function. We propose an improvised classification algorithm using a depth-based representation of clusters, to improve performance in terms of execution time and accuracy. Experimentation on real-world datasets reveals that proposed approach is at least two orders of magnitude faster for high-dimensional dataset and is at least as accurate as traditional kNN. © TÜBİTAK.
引用
收藏
页码:4082 / 4101
页数:19
相关论文
共 50 条
  • [31] Reducing high-dimensional data by principal component analysis vs. random projection for nearest neighbor classification
    Deegalla, Sampath
    Bostrom, Henrik
    ICMLA 2006: 5TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, PROCEEDINGS, 2006, : 245 - +
  • [32] Enhanced algorithm for high-dimensional data classification
    Wang, Xiaoming
    Wang, Shitong
    APPLIED SOFT COMPUTING, 2016, 40 : 1 - 9
  • [33] A training algorithm for classification of high-dimensional data
    Vieira, A
    Barradas, N
    NEUROCOMPUTING, 2003, 50 : 461 - 472
  • [34] Evaluation of a Depth-Based Multivariate k-Nearest Neighbor Resampling Method with Stormwater Quality Data
    Lee, Taesam
    Ouarda, Taha B. M. J.
    Chebana, Fateh
    Park, Daeryong
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2014, 2014
  • [35] Visual clustering for high dimensional data based on nearest neighbor
    Yu, Bei
    Wang, Jun
    Ye, Shiren
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2000, 37 (06): : 714 - 720
  • [36] An adaptive nearest neighbor classification algorithm for data streams
    Law, YN
    Zaniolo, C
    KNOWLEDGE DISCOVERY IN DATABASES: PKDD 2005, 2005, 3721 : 108 - 120
  • [37] Incremental updating of nearest neighbor-based high-dimensional entropy estimation
    Kybic, Jan
    2006 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-13, 2006, : 3255 - 3258
  • [38] High-Dimensional Nearest Neighbor Search-Based Blocking in Entity Resolution
    Zhang, Kaiyu
    Sun, Chenchen
    Shen, Derong
    Nie, Tiezheng
    Kou, Yue
    WEB INFORMATION SYSTEMS AND APPLICATIONS, WISA 2024, 2024, 14883 : 215 - 226
  • [39] Secure Cloud-Aided Approximate Nearest Neighbor Search on High-Dimensional Data
    Liu, Jia
    Wang, Yinchai
    Wei, Fengrui
    Han, Qing
    Tao, Yunting
    Zhao, Liping
    Li, Xinjin
    Sun, Hongbo
    IEEE ACCESS, 2023, 11 : 109027 - 109037
  • [40] A Sparse Reconstructive Evidential K-Nearest Neighbor Classifier for High-Dimensional Data
    Gong, Chaoyu
    Su, Zhi-Gang
    Wang, Pei-Hong
    Wang, Qian
    You, Yang
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (06) : 5563 - 5576