A novel hierarchical clustering algorithm for gene sequences

被引:73
|
作者
Wei, Dan [1 ,2 ,3 ]
Jiang, Qingshan [1 ]
Wei, Yanjie [1 ]
Wang, Shengrui [4 ]
机构
[1] Chinese Acad Sci, Shenzhen Key Lab High Performance Data Min, Shenzhen Inst Adv Technol, Shenzhen, Peoples R China
[2] Xiamen Univ, Dept Cognit Sci, Xiamen, Peoples R China
[3] Xiamen Univ, Fujian Key Lab Brain Intelligent Syst, Xiamen, Peoples R China
[4] Univ Sherbrooke, Dept Comp Sci, Sherbrooke, PQ J1K 2R1, Canada
来源
BMC BIOINFORMATICS | 2012年 / 13卷
基金
中国国家自然科学基金;
关键词
ALIGNMENT; DNA; PROTEIN; DISSIMILARITY; CLASSIFICATION; FREQUENCIES; SIMILARITY; DISTANCE; ENTROPY; SETS;
D O I
10.1186/1471-2105-13-174
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Clustering DNA sequences into functional groups is an important problem in bioinformatics. We propose a new alignment-free algorithm, mBKM, based on a new distance measure, DMk, for clustering gene sequences. This method transforms DNA sequences into the feature vectors which contain the occurrence, location and order relation of k-tuples in DNA sequence. Afterwards, a hierarchical procedure is applied to clustering DNA sequences based on the feature vectors. Results: The proposed distance measure and clustering method are evaluated by clustering functionally related genes and by phylogenetic analysis. This method is also compared with BlastClust, CD-HIT-EST and some others. The experimental results show our method is effective in classifying DNA sequences with similar biological characteristics and in discovering the underlying relationship among the sequences. Conclusions: We introduced a novel clustering algorithm which is based on a new sequence similarity measure. It is effective in classifying DNA sequences with similar biological characteristics and in discovering the relationship among the sequences.
引用
收藏
页数:15
相关论文
共 50 条
  • [21] A novel hierarchical clustering algorithm with merging strategy based on shared subordinates
    Shi, Jinxin
    Zhu, Qingsheng
    Li, Junnan
    APPLIED INTELLIGENCE, 2022, 52 (08) : 8635 - 8650
  • [22] A Novel Closed-Loop Clustering Algorithm for Hierarchical Load Forecasting
    Zhang, Chi
    Li, Ran
    IEEE TRANSACTIONS ON SMART GRID, 2021, 12 (01) : 432 - 441
  • [23] A novel clustering algorithm based on hierarchical and K-means clusteringz
    Li Wenchao
    Zhou Yong
    Xia Shixiong
    PROCEEDINGS OF THE 26TH CHINESE CONTROL CONFERENCE, VOL 4, 2007, : 605 - +
  • [24] A novel hierarchical clustering algorithm with merging strategy based on shared subordinates
    Jinxin Shi
    Qingsheng Zhu
    Junnan Li
    Applied Intelligence, 2022, 52 : 8635 - 8650
  • [25] An ensemble agglomerative hierarchical clustering algorithm based on clusters clustering technique and the novel similarity measurement
    Li, Teng
    Rezaeipanah, Amin
    El Din, ElSayed M. Tag
    JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2022, 34 (06) : 3828 - 3842
  • [26] Large scale hierarchical clustering of protein sequences
    Antje Krause
    Jens Stoye
    Martin Vingron
    BMC Bioinformatics, 6
  • [27] Large scale hierarchical clustering of protein sequences
    Krause, A
    Stoye, J
    Vingron, M
    BMC BIOINFORMATICS, 2005, 6 (1)
  • [28] DENCH: A density-based hierarchical clustering algorithm for gene expression data
    Sun Liang
    Zhao Fang
    Wang Yongji
    CHINESE JOURNAL OF ELECTRONICS, 2007, 16 (01): : 24 - 29
  • [29] Gene sequences clustering and identifying functional domain using a suffix tree algorithm
    Han, Sang Il
    Lee, Sung Gun
    Hwang, Kyu Suk
    Kim, Young Han
    2006 SICE-ICASE INTERNATIONAL JOINT CONFERENCE, VOLS 1-13, 2006, : 2315 - +
  • [30] CLAGen: A tool for clustering and annotating gene sequences using a suffix tree algorithm
    Han, Sang il
    Lee, Sung Gun
    Kim, Kyung-Hoon
    Choi, Chung Jung
    Kim, Young Han
    Hwang, Kyu Suk
    BIOSYSTEMS, 2006, 84 (03) : 175 - 182