A depth-based nearest neighbor algorithm for high-dimensional data classification

被引：0

作者：

Harikumar S. ^{[1
]}

Aravindakshan Savithri A. ^{[1
]}

Kaimal R. ^{[1
]}

机构：

[1] Department of Computer Science and Engineering, Amrita Vishwa Vidyapeetham, Amritapuri

来源：

Turkish Journal of Electrical Engineering and Computer Sciences | 2019年 / 27卷 / 06期

关键词：

Classification; Data-depth; Information gain; Nearest neighbor; Subspace-clustering;

D O I：

10.3906/ELK-1807-163

中图分类号：

学科分类号：

摘要：

Nearest neighbor algorithms like k-nearest neighbors (kNN) are fundamental supervised learning techniques to classify a query instance based on class labels of its neighbors. However, quite often, huge volumes of datasets are not fully labeled and the unknown probability distribution of the instances may be uneven. Moreover, kNN suffers from challenges like curse of dimensionality, setting the optimal number of neighbors, and scalability for high-dimensional data. To overcome these challenges, we propose an improvised approach of classification via depth representation of subspace clusters formed from high-dimensional data. We offer a consistent and principled approach to dynamically choose the nearest neighbors for classification of a query point by i) identifying structures and distributions of data; ii) extracting relevant features, and iii) deriving an optimum value of k depending on the structure of data by representing data using data depth function. We propose an improvised classification algorithm using a depth-based representation of clusters, to improve performance in terms of execution time and accuracy. Experimentation on real-world datasets reveals that proposed approach is at least two orders of magnitude faster for high-dimensional dataset and is at least as accurate as traditional kNN. © TÜBİTAK.

引用

页码：4082 / 4101

页数：19

共 50 条

[21] Hubness-based fuzzy measures for high-dimensional k-nearest neighbor classification
Tomasev, Nenad
Radovanovic, Milos
Mladenic, Dunja
Ivanovic, Mirjana
INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2014, 5 (03) : 445 - 458
[22] A classification algorithm for high-dimensional data
Roy, Asim
INNS CONFERENCE ON BIG DATA 2015 PROGRAM, 2015, 53 : 345 - 355
[23] An efficient secure k nearest neighbor classification protocol with high-dimensional features
Sun, Maohua
Yang, Ruidi
INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2020, 35 (11) : 1791 - 1813
[24] k Nearest Neighbor Similarity Join Algorithm on High-Dimensional Data Using Novel Partitioning Strategy
Ma, Youzhong
Hua, Qiaozhi
Wen, Zheng
Zhang, Ruiling
Zhang, Yongxin
Li, Haipeng
SECURITY AND COMMUNICATION NETWORKS, 2022, 2022
[25] Hubness-Aware Shared Neighbor Distances for High-Dimensional k-Nearest Neighbor Classification
Tomasev, Nenad
Mladenic, Dunja
HYBRID ARTIFICIAL INTELLIGENT SYSTEMS, PT II, 2012, 7209 : 116 - 127
[26] Accelerating massive queries of approximate nearest neighbor search on high-dimensional data
Liu, Yingfan
Song, Chaowei
Cheng, Hong
Xia, Xiaofang
Cui, Jiangtao
KNOWLEDGE AND INFORMATION SYSTEMS, 2023, 65 (10) : 4185 - 4212
[27] Accelerating massive queries of approximate nearest neighbor search on high-dimensional data
Yingfan Liu
Chaowei Song
Hong Cheng
Xiaofang Xia
Jiangtao Cui
Knowledge and Information Systems, 2023, 65 : 4185 - 4212
[28] Depth-based classification of directional data
Pandolfo, Giuseppe
D'Ambrosio, Antonio
EXPERT SYSTEMS WITH APPLICATIONS, 2021, 169
[29] Fast nearest neighbor search in high-dimensional space
Berchtold, S
Ertl, B
Keim, DA
Kriegel, HP
Seidl, T
14TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 1998, : 209 - 218
[30] Depth-based classification for functional data
Lopez-Pintado, Sara
Romo, Juan
Data Depth: Robust Multivariate Analysis, Computational Geometry and Applications, 2006, 72 : 103 - 119

← 1 2 3 4 5 →