Distance Functions, Clustering Algorithms and Microarray Data Analysis

被引:23
|
作者
Giancarlo, Raffaele [1 ]
Lo Bosco, Giosue [1 ]
Pinello, Luca [1 ]
机构
[1] Univ Palermo, Dipartimento Matemat & Informat, I-90133 Palermo, Italy
来源
关键词
GENE-EXPRESSION DATA; VALIDATION;
D O I
10.1007/978-3-642-13800-3_10
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Distance functions are a fundamental ingredient of classification and clustering procedures, and this holds true also in the particular case of microarray data. In the general data mining and classification literature, functions such as Euclidean distance or Pearson correlation have gained their status of de facto standards thanks to a considerable amount of experimental validation. For microarray data, the issue of which distance function "works best" has been investigated, but no final conclusion has been reached. The aim of this paper is to shed further light on that issue. Indeed, we present an experimental study, involving several distances, assessing (a) their intrinsic separation ability and (b) their predictive power when used in conjunction with clustering algorithms. The experiments have been carried out on six benchmark microarray datasets, where the "gold solution" is known for each of them. We have used both Hierarchical and K-means clustering algorithms and external validation criteria as evaluation tools. From the methodological point of view, the main result of this study is a ranking of those measures in terms of their intrinsic and clustering abilities, highlighting also the correlations between the two. Pragmatically, based on the outcomes of the experiments, one receives the indication that Minkowski, cosine and Pearson correlation distances seems to be the best choice when dealing with microarray data analysis.
引用
收藏
页码:125 / 138
页数:14
相关论文
共 50 条
  • [31] Multi-class clustering and prediction in the analysis of microarray data
    Tsai, CA
    Lee, TC
    Ho, IC
    Yang, UC
    Chen, CH
    Chen, JJ
    MATHEMATICAL BIOSCIENCES, 2005, 193 (01) : 79 - 100
  • [32] Descriptive and Systematic Comparison of Clustering Methods in Microarray Data Analysis
    Kim, Seo Young
    KOREAN JOURNAL OF APPLIED STATISTICS, 2009, 22 (01) : 89 - 106
  • [33] A novel clustering method for analysis of gene microarray expression data
    Luo, F
    Liu, J
    DATA MINING FOR BIOMEDICAL APPLICATIONS, PROCEEDINGS, 2006, 3916 : 71 - 81
  • [34] Clustering analysis of microarray gene expression data by splitting algorithm
    Wang, RY
    Scharenbroich, L
    Hart, C
    Wold, B
    Mjolsness, E
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2003, 63 (7-8) : 692 - 706
  • [35] An evolutionary clustering algorithm for gene expression microarray data analysis
    Ma, Patrick C. H.
    Chan, Keith C. C.
    Yao, Xin
    Chiu, David K. Y.
    IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2006, 10 (03) : 296 - 314
  • [36] Comparative Analysis of Genomic Signal Processing for Microarray Data Clustering
    Istepanian, Robert S. H.
    Sungoor, Ala
    Nebel, Jean-Christophe
    IEEE TRANSACTIONS ON NANOBIOSCIENCE, 2011, 10 (04) : 225 - 238
  • [37] Stochastic algorithms for exploratory data analysis: Data clustering and data visualization
    Buhmann, JM
    LEARNING IN GRAPHICAL MODELS, 1998, 89 : 405 - 419
  • [38] Statistical Analysis of Microarray Data Clustering using NMF, Spectral Clustering, Kmeans, and GMM
    Mirzal, Andri
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2022, 19 (02) : 1173 - 1192
  • [39] Clustering Microarray Data using Fuzzy Clustering with Viewpoints
    Karayianni, Katerina N.
    Spyrou, George M.
    Nikita, Konstantina S.
    IEEE 12TH INTERNATIONAL CONFERENCE ON BIOINFORMATICS & BIOENGINEERING, 2012, : 362 - 367
  • [40] CLUSTERING MICROARRAY GENE EXPRESSION DATA USING FUZZY C-MEANS AND DTW DISTANCE
    Taghizad, H.
    Mehridehnavi, A.
    2011 3RD INTERNATIONAL CONFERENCE ON COMPUTER TECHNOLOGY AND DEVELOPMENT (ICCTD 2011), VOL 1, 2012, : 395 - 399