A novel algorithm for detecting multiple covariance and clustering of biological sequences

被引:14
|
作者
Shen, Wei [1 ,2 ]
Li, Yan [1 ,2 ]
机构
[1] Third Mil Med Univ, Southwest Hosp, Med Res Ctr, Chongqing 400038, Peoples R China
[2] Third Mil Med Univ, Dept Microbiol, Coll Basic Med Sci, Chongqing 400038, Peoples R China
来源
SCIENTIFIC REPORTS | 2016年 / 6卷
基金
中国国家自然科学基金;
关键词
RESIDUE CONTACTS; PROTEIN; GAPDH; IDENTIFICATION; COEVOLUTION; INFORMATION; LIKELIHOOD; ALIGNMENT; FAMILIES;
D O I
10.1038/srep30425
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Single genetic mutations are always followed by a set of compensatory mutations. Thus, multiple changes commonly occur in biological sequences and play crucial roles in maintaining conformational and functional stability. Although many methods are available to detect single mutations or covariant pairs, detecting non-synchronous multiple changes at different sites in sequences remains challenging. Here, we develop a novel algorithm, named Fastcov, to identify multiple correlated changes in biological sequences using an independent pair model followed by a tandem model of site-residue elements based on inter-restriction thinking. Fastcov performed exceptionally well at harvesting co-pairs and detecting multiple covariant patterns. By 10-fold cross-validation using datasets of different scales, the characteristic patterns successfully classified the sequences into target groups with an accuracy of greater than 98%. Moreover, we demonstrated that the multiple covariant patterns represent co-evolutionary modes corresponding to the phylogenetic tree, and provide a new understanding of protein structural stability. In contrast to other methods, Fastcov provides not only a reliable and effective approach to identify covariant pairs but also more powerful functions, including multiple covariance detection and sequence classification, that are most useful for studying the point and compensatory mutations caused by natural selection, drug induction, environmental pressure, etc.
引用
收藏
页数:8
相关论文
共 50 条
  • [31] TreeCluster: Clustering biological sequences using phylogenetic trees
    Balaban, Metin
    Moshiri, Niema
    Mai, Uyen
    Jia, Xingfan
    Mirarab, Siavash
    PLOS ONE, 2019, 14 (08):
  • [32] Clustering biological sequences with dynamic sequence similarity threshold
    Chiu, Jimmy Ka Ho
    Ong, Rick Twee-Hee
    BMC BIOINFORMATICS, 2022, 23 (01)
  • [33] Clustering biological sequences with dynamic sequence similarity threshold
    Jimmy Ka Ho Chiu
    Rick Twee-Hee Ong
    BMC Bioinformatics, 23
  • [34] A Novel Clustering Algorithm for Graphs
    Chen, Dongming
    2009 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND COMPUTATIONAL INTELLIGENCE, VOL IV, PROCEEDINGS, 2009, : 279 - 283
  • [35] Multiple Base Stations Cooperation: A Novel Clustering Algorithm and Its Energy Efficiency
    Chao Meng
    Tian Liang
    Wei Heng
    Xiaoming Wang
    Wireless Personal Communications, 2016, 86 : 351 - 365
  • [36] Multiple Base Stations Cooperation: A Novel Clustering Algorithm and Its Energy Efficiency
    Meng, Chao
    Liang, Tian
    Heng, Wei
    Wang, Xiaoming
    WIRELESS PERSONAL COMMUNICATIONS, 2016, 86 (02) : 351 - 365
  • [37] A Novel Fast Clustering Algorithm
    Li Xia
    Jiang Sheng-yi
    Su Xiao-ke
    2009 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND COMPUTATIONAL INTELLIGENCE, VOL IV, PROCEEDINGS, 2009, : 284 - +
  • [38] A novel fuzzy clustering algorithm
    Yang, MS
    Wu, KL
    Yu, J
    2003 IEEE INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE IN ROBOTICS AND AUTOMATION, VOLS I-III, PROCEEDINGS, 2003, : 647 - 652
  • [39] A novel soft clustering algorithm
    Ma, Ruixin
    Wang, Xiao
    Meng, Fancheng
    CEIS 2011, 2011, 15
  • [40] A novel algorithm for data clustering
    Wong, CC
    Chen, CC
    Su, MC
    PATTERN RECOGNITION, 2001, 34 (02) : 425 - 442