Performance comparison of gene family clustering methods with expert curated gene family data set in Arabidopsis thaliana

被引:4
|
作者
Yang, Kuan [2 ,3 ]
Zhang, Liqing [1 ,3 ]
机构
[1] Virginia Tech, Dept Comp Sci, Blacksburg, VA 24061 USA
[2] Virginia Tech, Virginia Bioinformat Inst, Blacksburg, VA 24061 USA
[3] Virginia Tech, Program Genet Bioinformat & Computat Biol, Blacksburg, VA 24061 USA
基金
美国国家科学基金会;
关键词
Arabidopsis; complete linkage; gene family; hierarchical clustering algorithm; K-means clustering; single linkage; TribeMCL;
D O I
10.1007/s00425-008-0748-7
中图分类号
Q94 [植物学];
学科分类号
071001 ;
摘要
With the exponential growth of genomics data, the demand for reliable clustering methods is increasing every day. Despite the wide usage of many clustering algorithms, the accuracy of these algorithms has been evaluated mostly on simulated data sets and seldom on real biological data for which a "correct answer" is available. In order to address this issue, we use the manually curated high-quality Arabidopsis thaliana gene family database as a "gold standard" to conduct a comprehensive comparison of the accuracies of four widely used clustering methods including K-means, TribeMCL, single-linkage clustering and complete-linkage clustering. We compare the results from running different clustering methods on two matrices: the E-value matrix and the k-tuple distance matrix. The E-value matrix is computed based on BLAST E-values. The k-tuple distance matrix is computed based on the difference in tuple frequencies. The TribeMCL with the E-value matrix performed best, with the Inflation parameter (=1.15) tuned considerably lower than what has been suggested previously (=2). The single-linkage clustering method with the E-value matrix was second best. Single-linkage clustering, K-means clustering, complete-linkage clustering, and TribeMCL with a k-tuple distance matrix performed reasonably well. Complete-linkage clustering with the k-tuple distance matrix performed the worst.
引用
收藏
页码:439 / 447
页数:9
相关论文
共 50 条
  • [21] Characterization of DUF724 gene family in Arabidopsis thaliana
    Cao, Xi
    Yang, Ke-Zhen
    Xia, Chuan
    Zhang, Xue-Qin
    Chen, Li-Qun
    Ye, De
    PLANT MOLECULAR BIOLOGY, 2010, 72 (1-2) : 61 - 73
  • [22] THE STRUCTURE AND EVOLUTION OF THE ACTIN GENE FAMILY IN ARABIDOPSIS-THALIANA
    MCDOWELL, J
    HUANG, S
    MCKINNEY, E
    AN, YQ
    MEAGHER, R
    JOURNAL OF CELLULAR BIOCHEMISTRY, 1993, : 42 - 42
  • [23] MOLECULAR ANALYSIS OF THE MYOSIN GENE FAMILY IN ARABIDOPSIS-THALIANA
    KINKEMA, M
    WANG, HY
    SCHIEFELBEIN, J
    PLANT MOLECULAR BIOLOGY, 1994, 26 (04) : 1139 - 1153
  • [24] Determining the function of members of the CHX gene family in Arabidopsis thaliana
    Evans, A.
    Hall, D.
    Newbury, H. J.
    Pritchard, J.
    COMPARATIVE BIOCHEMISTRY AND PHYSIOLOGY A-MOLECULAR & INTEGRATIVE PHYSIOLOGY, 2005, 141 (03): : S339 - S339
  • [25] Computational analysis of the glutamate receptor gene family of Arabidopsis thaliana
    Roy, Bidhan Chandra
    Mukherjee, Ashutosh
    JOURNAL OF BIOMOLECULAR STRUCTURE & DYNAMICS, 2017, 35 (11): : 2454 - 2474
  • [26] Identification and biochemical characterization of the fructokinase gene family in Arabidopsis thaliana
    John W. Riggs
    Philip C. Cavales
    Sonia M. Chapiro
    Judy Callis
    BMC Plant Biology, 17
  • [27] Comprehensive Functional Analysis of the Catalase Gene Family in Arabidopsis thaliana
    Yan-Yan Du1
    2Laboratory of Plant Stress Biology
    JournalofIntegrativePlantBiology, 2008, (10) : 1318 - 1326
  • [28] Identification and biochemical characterization of the fructokinase gene family in Arabidopsis thaliana
    Riggs, John W.
    Cavales, Philip C.
    Chapiro, Sonia M.
    Callis, Judy
    BMC PLANT BIOLOGY, 2017, 17
  • [29] Characterization of the gene family for alternative oxidase from Arabidopsis thaliana
    Daisuke Saisho
    Eiji Nambara
    Satoshi Naito
    Nobuhiro Tsutsumi
    Atsushi Hirai
    Mikio Nakazono
    Plant Molecular Biology, 1997, 35 : 585 - 596
  • [30] Diversity of the Enzymatic Activity in the Lipoxygenase Gene Family of Arabidopsis thaliana
    Bannenberg, Gerard
    Martinez, Marta
    Hamberg, Mats
    Castresana, Carmen
    LIPIDS, 2009, 44 (02) : 85 - 95