Distance Functions, Clustering Algorithms and Microarray Data Analysis

被引:23
|
作者
Giancarlo, Raffaele [1 ]
Lo Bosco, Giosue [1 ]
Pinello, Luca [1 ]
机构
[1] Univ Palermo, Dipartimento Matemat & Informat, I-90133 Palermo, Italy
来源
关键词
GENE-EXPRESSION DATA; VALIDATION;
D O I
10.1007/978-3-642-13800-3_10
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Distance functions are a fundamental ingredient of classification and clustering procedures, and this holds true also in the particular case of microarray data. In the general data mining and classification literature, functions such as Euclidean distance or Pearson correlation have gained their status of de facto standards thanks to a considerable amount of experimental validation. For microarray data, the issue of which distance function "works best" has been investigated, but no final conclusion has been reached. The aim of this paper is to shed further light on that issue. Indeed, we present an experimental study, involving several distances, assessing (a) their intrinsic separation ability and (b) their predictive power when used in conjunction with clustering algorithms. The experiments have been carried out on six benchmark microarray datasets, where the "gold solution" is known for each of them. We have used both Hierarchical and K-means clustering algorithms and external validation criteria as evaluation tools. From the methodological point of view, the main result of this study is a ranking of those measures in terms of their intrinsic and clustering abilities, highlighting also the correlations between the two. Pragmatically, based on the outcomes of the experiments, one receives the indication that Minkowski, cosine and Pearson correlation distances seems to be the best choice when dealing with microarray data analysis.
引用
收藏
页码:125 / 138
页数:14
相关论文
共 50 条
  • [41] An Improved Method for Clustering Gene Microarray Data Based on Intra-Cluster Distance and Variance
    Bhattacharjee, Kasturi
    Chatterjee, Soumyadeep
    Konar, Amit
    Janarthanan, R.
    2009 IEEE INTERNATIONAL ADVANCE COMPUTING CONFERENCE, VOLS 1-3, 2009, : 20 - +
  • [42] Graph Clustering With Missing Data : Convex Algorithms and Analysis
    Vinayak, Ramya Korlakai
    Oymak, Samet
    Hassibi, Babak
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 27 (NIPS 2014), 2014, 27
  • [43] Differential identifiability clustering algorithms for big data analysis
    Shang, Tao
    Zhao, Zheng
    Ren, Xujie
    Liu, Jianwei
    SCIENCE CHINA-INFORMATION SCIENCES, 2021, 64 (05)
  • [44] Comparative Analysis of Nature Inspired Algorithms on Data Clustering
    Agarwal, Parul
    Mehta, Shikha
    2015 IEEE INTERNATIONAL CONFERENCE ON RESEARCH IN COMPUTATIONAL INTELLIGENCE AND COMMUNICATION NETWORKS (ICRCICN), 2015, : 119 - 124
  • [45] Differential identifiability clustering algorithms for big data analysis
    Tao Shang
    Zheng Zhao
    Xujie Ren
    Jianwei Liu
    Science China Information Sciences, 2021, 64
  • [46] Analysis of Clustering Algorithms in Machine Learning for Healthcare Data
    Zhang J.
    Zhong H.
    Journal of Commercial Biotechnology, 2022, 27 (05) : 82 - 91
  • [47] Differential identifiability clustering algorithms for big data analysis
    Tao SHANG
    Zheng ZHAO
    Xujie REN
    Jianwei LIU
    ScienceChina(InformationSciences), 2021, 64 (05) : 49 - 66
  • [48] Beyond Gene Clustering of Microarray Data
    Zheng, W. Jim
    Mao, Linyong
    Moussa, Omar
    Yordy, John S.
    Wang, Suiquan
    Kraft, Andrew S.
    Watson, Dennis K.
    IN VITRO CELLULAR & DEVELOPMENTAL BIOLOGY-ANIMAL, 2009, 45 : S3 - S3
  • [49] Integrating microarray data by consensus clustering
    Filkov, V
    Skiena, S
    15TH IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2003, : 418 - 426
  • [50] A spectral clustering method for microarray data
    Tritchler, D
    Fallah, S
    Beyene, J
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2005, 49 (01) : 63 - 76