Distance Functions, Clustering Algorithms and Microarray Data Analysis

被引：23

作者：

Giancarlo, Raffaele ^{[1
]}

Lo Bosco, Giosue ^{[1
]}

Pinello, Luca ^{[1
]}

机构：

[1] Univ Palermo, Dipartimento Matemat & Informat, I-90133 Palermo, Italy

来源：

LEARNING AND INTELLIGENT OPTIMIZATION | 2010年 / 6073卷

关键词：

GENE-EXPRESSION DATA; VALIDATION;

D O I：

10.1007/978-3-642-13800-3_10

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Distance functions are a fundamental ingredient of classification and clustering procedures, and this holds true also in the particular case of microarray data. In the general data mining and classification literature, functions such as Euclidean distance or Pearson correlation have gained their status of de facto standards thanks to a considerable amount of experimental validation. For microarray data, the issue of which distance function "works best" has been investigated, but no final conclusion has been reached. The aim of this paper is to shed further light on that issue. Indeed, we present an experimental study, involving several distances, assessing (a) their intrinsic separation ability and (b) their predictive power when used in conjunction with clustering algorithms. The experiments have been carried out on six benchmark microarray datasets, where the "gold solution" is known for each of them. We have used both Hierarchical and K-means clustering algorithms and external validation criteria as evaluation tools. From the methodological point of view, the main result of this study is a ranking of those measures in terms of their intrinsic and clustering abilities, highlighting also the correlations between the two. Pragmatically, based on the outcomes of the experiments, one receives the indication that Minkowski, cosine and Pearson correlation distances seems to be the best choice when dealing with microarray data analysis.

引用

页码：125 / 138

页数：14

共 50 条

[21] Distance functions for clustering time course gene expression data
Chalasani, V
Sundaram, S
METMBS'03: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON MATHEMATICS AND ENGINEERING TECHNIQUES IN MEDICINE AND BIOLOGICAL SCIENCES, 2003, : 515 - 518
[22] Analysis of Imputation Algorithms for Microarray Gene Expression Data
Shashirekha, H. L.
Wani, Agaz Hussain
PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON APPLIED AND THEORETICAL COMPUTING AND COMMUNICATION TECHNOLOGY (ICATCCT), 2015, : 589 - 593
[23] Comparative Analysis and Evaluation of Biclustering Algorithms for Microarray Data
Maind, Ankush
Raut, Shital
NETWORKING COMMUNICATION AND DATA KNOWLEDGE ENGINEERING, VOL 2, 2018, 4 : 159 - 171
[24] Clustering analysis of microarray gene expression data with new clustering ensemble method
Luo, Fei
Liu, Juan
PROGRESS IN INTELLIGENCE COMPUTATION AND APPLICATIONS, PROCEEDINGS, 2007, : 500 - 504
[25] Knowledgeable clustering of microarray data
Potamias, G
BIOLOGICAL AND MEDICAL DATA ANALYSIS, PROCEEDINGS, 2004, 3337 : 491 - 497
[26] Clustering DNA microarray data
Maciejewski, H
Jasinska, A
Computer Recognition Systems, Proceedings, 2005, : 595 - 601
[27] The research on clustering algorithms in big data analysis
Liu, Weigang
BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2020, 127 : 75 - 75
[28] Analysis of Mahout Big Data Clustering Algorithms
Sharma, Ishan
Tiwari, Rajeev
Rana, Hukam Singh
Anand, Abhineet
INTELLIGENT COMMUNICATION, CONTROL AND DEVICES, ICICCD 2017, 2018, 624 : 999 - 1008
[29] Speeding up the Consensus Clustering methodology for microarray data analysis
Giancarlo, Raffaele
Utro, Filippo
ALGORITHMS FOR MOLECULAR BIOLOGY, 2011, 6
[30] Speeding up the Consensus Clustering methodology for microarray data analysis
Raffaele Giancarlo
Filippo Utro
Algorithms for Molecular Biology, 6

← 1 2 3 4 5 →