Selections of data preprocessing methods and similarity metrics for gene cluster analysis

被引:0
|
作者
YANG Chunmei
Motorola (China) Electronics Ltd.
机构
关键词
gene expression; cluster analysis; data preprocessing; similarity metrics; Rand index;
D O I
暂无
中图分类号
Q75-33 [];
学科分类号
摘要
Clustering is one of the major exploratory techniques for gene expression data analysis. Only with suitable similarity metrics and when datasets are properly preprocessed, can results of high quality be obtained in cluster analysis. In this study, gene expression datasets with external evaluation criteria were preprocessed as normalization by line, normalization by column or logarithm transformation by base-2, and were subsequently clustered by hierarchical clustering, k-means clustering and self-organizing maps (SOMs) with Pearson correlation coefficient or Euclidean distance as similarity metric. Finally, the quality of clusters was evaluated by adjusted Rand index. The results illustrate that k -means clustering and SOMs have distinct advantages over hierarchical clustering in gene clustering, and SOMs are a bit better than k-means when randomly initialized. It also shows that hierarchical clustering prefers Pearson correlation coefficient as similarity metric and dataset normalized by line. Meanwhile, k -means clustering and SOMs can produce better clusters with Euclidean distance and logarithm transformed datasets. These results will afford valuable reference to the implementation of gene expression cluster analysis.
引用
收藏
页码:607 / 613
页数:7
相关论文
共 50 条
  • [1] Selections of data preprocessing methods and similarity metrics for gene cluster analysis
    Department of Biomedical Engineering and Scientific Instrumentations, Tianjin University, Tianjin 300072, China
    不详
    Prog. Nat. Sci., 2006, 6 (607-613):
  • [2] Selections of data preprocessing methods and similarity metrics for gene cluster analysis
    Yang Chunmei
    Wan Baikun
    Gao Xiaofeng
    PROGRESS IN NATURAL SCIENCE-MATERIALS INTERNATIONAL, 2006, 16 (06) : 607 - 613
  • [3] Data preprocessing in cluster analysis of gene expression
    Yang, CM
    Wan, BK
    Gao, XF
    CHINESE PHYSICS LETTERS, 2003, 20 (05) : 774 - 777
  • [4] A Data Preprocessing Method Applied to Cluster Analysis on Stock Data by Kmeans
    Xiong, Zhigang
    Zhang, Zhongneng
    PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON INTELLIGENT CONTROL AND COMPUTER APPLICATION, 2016, 30 : 142 - 145
  • [5] Similarity classifier with generalized mean applied to medical data using different preprocessing methods
    Luukka, P
    Leppälampi, T
    FUZZ-IEEE 2005: PROCEEDINGS OF THE IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS: BIGGEST LITTLE CONFERENCE IN THE WORLD, 2005, : 79 - 84
  • [6] Determination of the statistical similarity of the physicochemical measurement data of shale formations based on the methods of cluster analysis
    Letkowski, Piotr
    Golabek, Andrzej
    Budak, Pawel
    Szpunar, Tadeusz
    Nowak, Robert
    Arabas, Jaroslaw
    NAFTA-GAZ, 2016, 72 (11): : 910 - 918
  • [7] On data preprocessing for subspace methods
    Bauer, D
    PROCEEDINGS OF THE 39TH IEEE CONFERENCE ON DECISION AND CONTROL, VOLS 1-5, 2000, : 2403 - 2408
  • [8] A SIMILARITY MEASURE FOR CHEMICAL DATA: APPLICATIONS TO CLUSTER ANALYSIS
    Kolossvary, Istvan
    Wegscheider, Wolfhard
    JOURNAL OF CHEMOMETRICS, 1990, 4 (03) : 255 - 266
  • [9] Gene expression data preprocessing
    Herrero, J
    Díaz-Uriarte, R
    Dopazo, J
    BIOINFORMATICS, 2003, 19 (05) : 655 - 656
  • [10] Patent Similarity Data and Innovation Metrics
    Whalen, Ryan
    Lungeanu, Alina
    DeChurch, Leslie
    Contractor, Noshir
    JOURNAL OF EMPIRICAL LEGAL STUDIES, 2020, 17 (03) : 615 - 639