Distance Functions, Clustering Algorithms and Microarray Data Analysis

被引:23
|
作者
Giancarlo, Raffaele [1 ]
Lo Bosco, Giosue [1 ]
Pinello, Luca [1 ]
机构
[1] Univ Palermo, Dipartimento Matemat & Informat, I-90133 Palermo, Italy
来源
关键词
GENE-EXPRESSION DATA; VALIDATION;
D O I
10.1007/978-3-642-13800-3_10
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Distance functions are a fundamental ingredient of classification and clustering procedures, and this holds true also in the particular case of microarray data. In the general data mining and classification literature, functions such as Euclidean distance or Pearson correlation have gained their status of de facto standards thanks to a considerable amount of experimental validation. For microarray data, the issue of which distance function "works best" has been investigated, but no final conclusion has been reached. The aim of this paper is to shed further light on that issue. Indeed, we present an experimental study, involving several distances, assessing (a) their intrinsic separation ability and (b) their predictive power when used in conjunction with clustering algorithms. The experiments have been carried out on six benchmark microarray datasets, where the "gold solution" is known for each of them. We have used both Hierarchical and K-means clustering algorithms and external validation criteria as evaluation tools. From the methodological point of view, the main result of this study is a ranking of those measures in terms of their intrinsic and clustering abilities, highlighting also the correlations between the two. Pragmatically, based on the outcomes of the experiments, one receives the indication that Minkowski, cosine and Pearson correlation distances seems to be the best choice when dealing with microarray data analysis.
引用
收藏
页码:125 / 138
页数:14
相关论文
共 50 条
  • [21] Distance functions for clustering time course gene expression data
    Chalasani, V
    Sundaram, S
    METMBS'03: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON MATHEMATICS AND ENGINEERING TECHNIQUES IN MEDICINE AND BIOLOGICAL SCIENCES, 2003, : 515 - 518
  • [22] Analysis of Imputation Algorithms for Microarray Gene Expression Data
    Shashirekha, H. L.
    Wani, Agaz Hussain
    PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON APPLIED AND THEORETICAL COMPUTING AND COMMUNICATION TECHNOLOGY (ICATCCT), 2015, : 589 - 593
  • [23] Comparative Analysis and Evaluation of Biclustering Algorithms for Microarray Data
    Maind, Ankush
    Raut, Shital
    NETWORKING COMMUNICATION AND DATA KNOWLEDGE ENGINEERING, VOL 2, 2018, 4 : 159 - 171
  • [24] Clustering analysis of microarray gene expression data with new clustering ensemble method
    Luo, Fei
    Liu, Juan
    PROGRESS IN INTELLIGENCE COMPUTATION AND APPLICATIONS, PROCEEDINGS, 2007, : 500 - 504
  • [25] Knowledgeable clustering of microarray data
    Potamias, G
    BIOLOGICAL AND MEDICAL DATA ANALYSIS, PROCEEDINGS, 2004, 3337 : 491 - 497
  • [26] Clustering DNA microarray data
    Maciejewski, H
    Jasinska, A
    Computer Recognition Systems, Proceedings, 2005, : 595 - 601
  • [27] The research on clustering algorithms in big data analysis
    Liu, Weigang
    BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2020, 127 : 75 - 75
  • [28] Analysis of Mahout Big Data Clustering Algorithms
    Sharma, Ishan
    Tiwari, Rajeev
    Rana, Hukam Singh
    Anand, Abhineet
    INTELLIGENT COMMUNICATION, CONTROL AND DEVICES, ICICCD 2017, 2018, 624 : 999 - 1008
  • [29] Speeding up the Consensus Clustering methodology for microarray data analysis
    Giancarlo, Raffaele
    Utro, Filippo
    ALGORITHMS FOR MOLECULAR BIOLOGY, 2011, 6
  • [30] Speeding up the Consensus Clustering methodology for microarray data analysis
    Raffaele Giancarlo
    Filippo Utro
    Algorithms for Molecular Biology, 6