Evaluation of k-nearest neighbour classifier performance for heterogeneous data sets

被引:120
|
作者
Ali, Najat [1 ]
Neagu, Daniel [1 ]
Trundle, Paul [1 ]
机构
[1] Univ Bradford, Fac Engn & Informat, Bradford BD7 1DP, W Yorkshire, England
来源
SN APPLIED SCIENCES | 2019年 / 1卷 / 12期
关键词
k-nearest neighbour; Heterogeneous data set; Combination similarity measures; SIMILARITY MEASURE;
D O I
10.1007/s42452-019-1356-9
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Distance-based algorithms are widely used for data classification problems. The k-nearest neighbour classification (k-NN) is one of the most popular distance-based algorithms. This classification is based on measuring the distances between the test sample and the training samples to determine the final classification output. The traditional k-NN classifier works naturally with numerical data. The main objective of this paper is to investigate the performance of k-NN on heterogeneous datasets, where data can be described as a mixture of numerical and categorical features. For the sake of simplicity, this work considers only one type of categorical data, which is binary data. In this paper, several similarity measures have been defined based on a combination between well-known distances for both numerical and binary data, and to investigate k-NN performances for classifying such heterogeneous data sets. The experiments used six heterogeneous datasets from different domains and two categories of measures. Experimental results showed that the proposed measures performed better for heterogeneous data than Euclidean distance, and that the challenges raised by the nature of heterogeneous data need personalised similarity measures adapted to the data characteristics.
引用
收藏
页数:15
相关论文
共 50 条
  • [21] Use of the K-Nearest Neighbour Classifier in Wear Condition Classification of a Positive Displacement Pump
    Konieczny, Jaroslaw
    Stojek, Jerzy
    SENSORS, 2021, 21 (18)
  • [22] Wiring networks diagnosis using K-Nearest neighbour classifier and dynamic time warping
    Goudjil, Abdelhak
    Smail, Mostafa Kamel
    Pichon, Lionel
    Bouchekara, Houssem R. E. H.
    Javaid, Muhammad Sharjeel
    NONDESTRUCTIVE TESTING AND EVALUATION, 2024, 39 (08) : 2888 - 2905
  • [23] Multilabel Prototype Generation for data reduction in K-Nearest Neighbour classification
    Valero-Mas, Jose J.
    Javier Gallego, Antonio
    Alonso-Jimenez, Pablo
    Serra, Xavier
    PATTERN RECOGNITION, 2023, 135
  • [24] A binary neural k-nearest neighbour technique
    Victoria J. Hodge
    Jim Austin
    Knowledge and Information Systems, 2005, 8 : 276 - 291
  • [25] A stacking weighted k-Nearest neighbour with thresholding
    Rastin, Niloofar
    Taheri, Mohammad
    Jahromi, Mansoor Zolghadri
    INFORMATION SCIENCES, 2021, 571 : 605 - 622
  • [26] Exact bagging with k-nearest neighbour classifiers
    Caprile, B
    Merler, S
    Furlanello, C
    Jurman, G
    MULTIPLE CLASSIFIER SYSTEMS, PROCEEDINGS, 2004, 3077 : 72 - 81
  • [27] Median strings for k-nearest neighbour classification
    Martínez-Hinarejos, CD
    Juan, A
    Casacuberta, F
    PATTERN RECOGNITION LETTERS, 2003, 24 (1-3) : 173 - 181
  • [28] NAIVE BAYESIAN AND K-NEAREST NEIGHBOUR TO CATEGORIZE ARABIC TEXT DATA
    Hadi, Wa'el Musa
    Thabtah, Fadi
    Hawari, Samer A. L.
    Ababneh, Jafar
    EUROPEAN SIMULATION AND MODELLING CONFERENCE 2008, 2008, : 196 - 200
  • [29] Small components in k-nearest neighbour graphs
    Walters, Mark
    DISCRETE APPLIED MATHEMATICS, 2012, 160 (13-14) : 2037 - 2047
  • [30] Improved AURA k-Nearest Neighbour approach
    Weeks, M
    Hodge, V
    O'Keefe, S
    Austin, J
    Lees, K
    ARTIFICIAL NEURAL NETS PROBLEM SOLVING METHODS, PT II, 2003, 2687 : 663 - 670