A Comparison Study on Similarity and Dissimilarity Measures in Clustering Continuous Data

被引:238
|
作者
Shirkhorshidi, Ali Seyed [1 ]
Aghabozorgi, Saeed [2 ]
Teh Ying Wah [1 ]
机构
[1] Univ Malaya, Fac Comp Sci & Informat Technol, Dept Informat Syst, Kuala Lumpur 50603, Malaysia
[2] IBM Canada Ltd, Emerging Technol, IBM Analyt, Platform, Markham, ON L6F 1C7, Canada
来源
PLOS ONE | 2015年 / 10卷 / 12期
关键词
GENE-EXPRESSION DATA; SERIES;
D O I
10.1371/journal.pone.0144059
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Similarity or distance measures are core components used by distance-based clustering algorithms to cluster similar data points into the same clusters, while dissimilar or distant data points are placed into different clusters. The performance of similarity measures is mostly addressed in two or three-dimensional spaces, beyond which, to the best of our knowledge, there is no empirical study that has revealed the behavior of similarity measures when dealing with high-dimensional datasets. To fill this gap, a technical framework is proposed in this study to analyze, compare and benchmark the influence of different similarity measures on the results of distance-based clustering algorithms. For reproducibility purposes, fifteen publicly available datasets were used for this study, and consequently, future distance measures can be evaluated and compared with the results of the measures discussed in this work. These datasets were classified as low and high-dimensional categories to study the performance of each measure against each category. This research should help the research community to identify suitable distance measures for datasets and also to facilitate a comparison and evaluation of the newly proposed similarity or distance measures with traditional ones.
引用
收藏
页数:20
相关论文
共 50 条
  • [1] Similarity and Dissimilarity Measures for Comparison of Propagation Patterns
    Rickard, H. Erin
    Saeger, J. T.
    Hackett, Erin E.
    2015 IEEE INTERNATIONAL SYMPOSIUM ON ANTENNAS AND PROPAGATION & USNC/URSI NATIONAL RADIO SCIENCE MEETING, 2015, : 1118 - 1119
  • [2] Comparison of Similarity Measures for Categorical Data in Hierarchical Clustering
    Sulc, Zdenek
    Rezankova, Hana
    JOURNAL OF CLASSIFICATION, 2019, 36 (01) : 58 - 72
  • [3] Comparison of Similarity Measures for Categorical Data in Hierarchical Clustering
    Zdeněk Šulc
    Hana Řezanková
    Journal of Classification, 2019, 36 : 58 - 72
  • [4] A Comparison Study of Similarity Measures in Rough Sets Clustering
    Szederjesi-Dragomir, Arnold
    Gaceanu, Radu D.
    Pop, Horia F.
    Sarbu, Costel
    2019 IEEE 15TH INTERNATIONAL SCIENTIFIC CONFERENCE ON INFORMATICS (INFORMATICS 2019), 2019, : 37 - 42
  • [5] A note on ?Similarity and dissimilarity measures between fuzzy sets: A formal relational study? and ?Additive similarity and dissimilarity measures?
    Couso, Ines
    Sanchez, Luciano
    FUZZY SETS AND SYSTEMS, 2020, 390 (390) : 183 - 187
  • [6] A Comparative Analysis of Dissimilarity Measures for Clustering Categorical Data
    Xavierr-Junior, Joao C.
    Canuto, Anne M. P.
    Almeida, Noriedson D.
    Goncalves, Luiz M. G.
    2013 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2013,
  • [7] Additive similarity and dissimilarity measures
    Couso, Ines
    Sanchez, Luciano
    FUZZY SETS AND SYSTEMS, 2017, 322 : 35 - 53
  • [8] Neural gas clustering for dissimilarity data with continuous prototypes
    Hasenfuss, Alexander
    Hammer, Barbara
    Schleif, Frank-Michael
    Villmann, Thomas
    COMPUTATIONAL AND AMBIENT INTELLIGENCE, 2007, 4507 : 539 - +
  • [9] Comparison of similarity measures for clustering Turkish documents
    Madylova, Ainura
    Oguducu, Sule Guenduez
    INTELLIGENT DATA ANALYSIS, 2009, 13 (05) : 815 - 832
  • [10] A comparative study of string dissimilarity measures in structural clustering
    Fred, ALN
    Leitao, JMN
    INTERNATIONAL CONFERENCE ON ADVANCES IN PATTERN RECOGNITION, 1999, : 385 - 394