A novel hierarchical clustering algorithm for gene sequences

被引:73
|
作者
Wei, Dan [1 ,2 ,3 ]
Jiang, Qingshan [1 ]
Wei, Yanjie [1 ]
Wang, Shengrui [4 ]
机构
[1] Chinese Acad Sci, Shenzhen Key Lab High Performance Data Min, Shenzhen Inst Adv Technol, Shenzhen, Peoples R China
[2] Xiamen Univ, Dept Cognit Sci, Xiamen, Peoples R China
[3] Xiamen Univ, Fujian Key Lab Brain Intelligent Syst, Xiamen, Peoples R China
[4] Univ Sherbrooke, Dept Comp Sci, Sherbrooke, PQ J1K 2R1, Canada
来源
BMC BIOINFORMATICS | 2012年 / 13卷
基金
中国国家自然科学基金;
关键词
ALIGNMENT; DNA; PROTEIN; DISSIMILARITY; CLASSIFICATION; FREQUENCIES; SIMILARITY; DISTANCE; ENTROPY; SETS;
D O I
10.1186/1471-2105-13-174
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Clustering DNA sequences into functional groups is an important problem in bioinformatics. We propose a new alignment-free algorithm, mBKM, based on a new distance measure, DMk, for clustering gene sequences. This method transforms DNA sequences into the feature vectors which contain the occurrence, location and order relation of k-tuples in DNA sequence. Afterwards, a hierarchical procedure is applied to clustering DNA sequences based on the feature vectors. Results: The proposed distance measure and clustering method are evaluated by clustering functionally related genes and by phylogenetic analysis. This method is also compared with BlastClust, CD-HIT-EST and some others. The experimental results show our method is effective in classifying DNA sequences with similar biological characteristics and in discovering the underlying relationship among the sequences. Conclusions: We introduced a novel clustering algorithm which is based on a new sequence similarity measure. It is effective in classifying DNA sequences with similar biological characteristics and in discovering the relationship among the sequences.
引用
收藏
页数:15
相关论文
共 50 条
  • [41] A hierarchical projection pursuit clustering algorithm
    Miasnikov, AD
    Rome, JE
    Haralick, RM
    PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 1, 2004, : 268 - 271
  • [42] An improved approximation algorithm for hierarchical clustering
    Mondal, Sakib A.
    PATTERN RECOGNITION LETTERS, 2018, 104 : 23 - 28
  • [43] Genetic algorithm in hierarchical clustering methods
    2000, China Educ Book Import Export Corp, China (24):
  • [44] Avalanche: A Hierarchical, Divisive Clustering Algorithm
    Amalaman, Paul K.
    Eick, Christoph F.
    MACHINE LEARNING AND DATA MINING IN PATTERN RECOGNITION, MLDM 2015, 2015, 9166 : 296 - 310
  • [45] A hierarchical clustering algorithm for MIMD architecture
    Du, ZH
    Lin, F
    COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2004, 28 (5-6) : 417 - 419
  • [46] Gaussian Hierarchical Bayesian Clustering algorithm
    Christ, Rafael Eduardo Kuviaro
    Talavera, Edwin Villanueva
    Maciel, Carlos Dias
    PROCEEDINGS OF THE 7TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS DESIGN AND APPLICATIONS, 2007, : 133 - 137
  • [47] SHC: a spectral algorithm for hierarchical clustering
    Li Xiaohong
    Huang Jingwei
    MINES 2009: FIRST INTERNATIONAL CONFERENCE ON MULTIMEDIA INFORMATION NETWORKING AND SECURITY, VOL 2, PROCEEDINGS, 2009, : 197 - 200
  • [48] A hierarchical clustering algorithm for categorical attributes
    Agarwal, Parul
    Alam, M. Afshar
    Biswas, Ranjit
    2010 SECOND INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND APPLICATIONS: ICCEA 2010, PROCEEDINGS, VOL 2, 2010, : 365 - 368
  • [49] Hierarchical clustering algorithm of the minimum risk
    Wang De-xing
    Xu Jie-long
    Yuan Hongchun
    MEASUREMENT TECHNOLOGY AND ENGINEERING RESEARCHES IN INDUSTRY, PTS 1-3, 2013, 333-335 : 1410 - 1413
  • [50] A randomized algorithm for clustering discrete sequences
    Jiang, Mudi
    Hu, Lianyu
    Han, Xin
    Zhou, Yong
    He, Zengyou
    PATTERN RECOGNITION, 2024, 151