Selections of data preprocessing methods and similarity metrics for gene cluster analysis

被引：0

作者：

YANG Chunmei

Motorola (China) Electronics Ltd.

机构：

来源：

ProgressinNaturalScience | 2006年 / 06期

关键词：

gene expression; cluster analysis; data preprocessing; similarity metrics; Rand index;

D O I：

暂无

中图分类号：

Q75-33 [];

学科分类号：

摘要：

Clustering is one of the major exploratory techniques for gene expression data analysis. Only with suitable similarity metrics and when datasets are properly preprocessed, can results of high quality be obtained in cluster analysis. In this study, gene expression datasets with external evaluation criteria were preprocessed as normalization by line, normalization by column or logarithm transformation by base-2, and were subsequently clustered by hierarchical clustering, k-means clustering and self-organizing maps (SOMs) with Pearson correlation coefficient or Euclidean distance as similarity metric. Finally, the quality of clusters was evaluated by adjusted Rand index. The results illustrate that k -means clustering and SOMs have distinct advantages over hierarchical clustering in gene clustering, and SOMs are a bit better than k-means when randomly initialized. It also shows that hierarchical clustering prefers Pearson correlation coefficient as similarity metric and dataset normalized by line. Meanwhile, k -means clustering and SOMs can produce better clusters with Euclidean distance and logarithm transformed datasets. These results will afford valuable reference to the implementation of gene expression cluster analysis.

引用

页码：607 / 613

页数：7

共 50 条

[1] Selections of data preprocessing methods and similarity metrics for gene cluster analysis
Department of Biomedical Engineering and Scientific Instrumentations, Tianjin University, Tianjin 300072, China
不详
Prog. Nat. Sci., 2006, 6 (607-613):
[2] Selections of data preprocessing methods and similarity metrics for gene cluster analysis
Yang Chunmei
Wan Baikun
Gao Xiaofeng
PROGRESS IN NATURAL SCIENCE-MATERIALS INTERNATIONAL, 2006, 16 (06) : 607 - 613
[3] Data preprocessing in cluster analysis of gene expression
Yang, CM
Wan, BK
Gao, XF
CHINESE PHYSICS LETTERS, 2003, 20 (05) : 774 - 777
[4] A Data Preprocessing Method Applied to Cluster Analysis on Stock Data by Kmeans
Xiong, Zhigang
Zhang, Zhongneng
PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON INTELLIGENT CONTROL AND COMPUTER APPLICATION, 2016, 30 : 142 - 145
[5] Similarity classifier with generalized mean applied to medical data using different preprocessing methods
Luukka, P
Leppälampi, T
FUZZ-IEEE 2005: PROCEEDINGS OF THE IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS: BIGGEST LITTLE CONFERENCE IN THE WORLD, 2005, : 79 - 84
[6] Determination of the statistical similarity of the physicochemical measurement data of shale formations based on the methods of cluster analysis
Letkowski, Piotr
Golabek, Andrzej
Budak, Pawel
Szpunar, Tadeusz
Nowak, Robert
Arabas, Jaroslaw
NAFTA-GAZ, 2016, 72 (11): : 910 - 918
[7] On data preprocessing for subspace methods
Bauer, D
PROCEEDINGS OF THE 39TH IEEE CONFERENCE ON DECISION AND CONTROL, VOLS 1-5, 2000, : 2403 - 2408
[8] A SIMILARITY MEASURE FOR CHEMICAL DATA: APPLICATIONS TO CLUSTER ANALYSIS
Kolossvary, Istvan
Wegscheider, Wolfhard
JOURNAL OF CHEMOMETRICS, 1990, 4 (03) : 255 - 266
[9] Gene expression data preprocessing
Herrero, J
Díaz-Uriarte, R
Dopazo, J
BIOINFORMATICS, 2003, 19 (05) : 655 - 656
[10] Patent Similarity Data and Innovation Metrics
Whalen, Ryan
Lungeanu, Alina
DeChurch, Leslie
Contractor, Noshir
JOURNAL OF EMPIRICAL LEGAL STUDIES, 2020, 17 (03) : 615 - 639

← 1 2 3 4 5 →