A depth-based nearest neighbor algorithm for high-dimensional data classification

被引:0
|
作者
Harikumar S. [1 ]
Aravindakshan Savithri A. [1 ]
Kaimal R. [1 ]
机构
[1] Department of Computer Science and Engineering, Amrita Vishwa Vidyapeetham, Amritapuri
关键词
Classification; Data-depth; Information gain; Nearest neighbor; Subspace-clustering;
D O I
10.3906/ELK-1807-163
中图分类号
学科分类号
摘要
Nearest neighbor algorithms like k-nearest neighbors (kNN) are fundamental supervised learning techniques to classify a query instance based on class labels of its neighbors. However, quite often, huge volumes of datasets are not fully labeled and the unknown probability distribution of the instances may be uneven. Moreover, kNN suffers from challenges like curse of dimensionality, setting the optimal number of neighbors, and scalability for high-dimensional data. To overcome these challenges, we propose an improvised approach of classification via depth representation of subspace clusters formed from high-dimensional data. We offer a consistent and principled approach to dynamically choose the nearest neighbors for classification of a query point by i) identifying structures and distributions of data; ii) extracting relevant features, and iii) deriving an optimum value of k depending on the structure of data by representing data using data depth function. We propose an improvised classification algorithm using a depth-based representation of clusters, to improve performance in terms of execution time and accuracy. Experimentation on real-world datasets reveals that proposed approach is at least two orders of magnitude faster for high-dimensional dataset and is at least as accurate as traditional kNN. © TÜBİTAK.
引用
收藏
页码:4082 / 4101
页数:19
相关论文
共 50 条
  • [41] Exploiting lower bounds to accelerate approximate nearest neighbor search on high-dimensional data
    Liu, Yingfan
    Wei, Hao
    Cheng, Hong
    INFORMATION SCIENCES, 2018, 465 : 484 - 504
  • [42] Sequential random k-nearest neighbor feature selection for high-dimensional data
    Park, Chan Hee
    Kim, Seoung Bum
    EXPERT SYSTEMS WITH APPLICATIONS, 2015, 42 (05) : 2336 - 2342
  • [43] Exploring the Meaningfulness of Nearest Neighbor Search in High-Dimensional Space
    Chen, Zhonghan
    Zhang, Ruiyuan
    Zhao, Xi
    Cheng, Xiaojun
    Zhou, Xiaofang
    DATABASES THEORY AND APPLICATIONS, ADC 2024, 2025, 15449 : 181 - 194
  • [44] Randomized Embeddings with Slack and High-Dimensional Approximate Nearest Neighbor
    Anagnostopoulos, Evangelos
    Emiris, Ioannis Z.
    Psarros, Ioannis
    ACM TRANSACTIONS ON ALGORITHMS, 2018, 14 (02)
  • [45] A note on depth-based classification of circular data
    Pandolfo, Giuseppe
    D'Ambrosio, Antonio
    Porzio, Giovanni C.
    ELECTRONIC JOURNAL OF APPLIED STATISTICAL ANALYSIS, 2018, 11 (02) : 447 - 462
  • [46] New instability results for high-dimensional nearest neighbor search
    Giannella, Chris
    INFORMATION PROCESSING LETTERS, 2009, 109 (19) : 1109 - 1113
  • [47] Spatial depth-based classification for functional data
    Carlo Sguera
    Pedro Galeano
    Rosa Lillo
    TEST, 2014, 23 : 725 - 750
  • [48] Spatial depth-based classification for functional data
    Sguera, Carlo
    Galeano, Pedro
    Lillo, Rosa
    TEST, 2014, 23 (04) : 725 - 750
  • [49] Clonal Selection Classification Algorithm for High-Dimensional Data
    Liu, Ruochen
    Zhang, Ping
    Jiao, Licheng
    LIFE SYSTEM MODELING AND INTELLIGENT COMPUTING, PT II, 2010, 98 : 89 - 95
  • [50] Spectral Clustering of High-Dimensional Data via k-Nearest Neighbor Based Sparse Representation Coefficients
    Chen, Fang
    Wang, Shulin
    Fang, Jianwen
    ADVANCED INTELLIGENT COMPUTING THEORIES AND APPLICATIONS, ICIC 2015, PT III, 2015, 9227 : 363 - 374