A novel algorithm for detecting multiple covariance and clustering of biological sequences

被引:14
|
作者
Shen, Wei [1 ,2 ]
Li, Yan [1 ,2 ]
机构
[1] Third Mil Med Univ, Southwest Hosp, Med Res Ctr, Chongqing 400038, Peoples R China
[2] Third Mil Med Univ, Dept Microbiol, Coll Basic Med Sci, Chongqing 400038, Peoples R China
来源
SCIENTIFIC REPORTS | 2016年 / 6卷
基金
中国国家自然科学基金;
关键词
RESIDUE CONTACTS; PROTEIN; GAPDH; IDENTIFICATION; COEVOLUTION; INFORMATION; LIKELIHOOD; ALIGNMENT; FAMILIES;
D O I
10.1038/srep30425
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Single genetic mutations are always followed by a set of compensatory mutations. Thus, multiple changes commonly occur in biological sequences and play crucial roles in maintaining conformational and functional stability. Although many methods are available to detect single mutations or covariant pairs, detecting non-synchronous multiple changes at different sites in sequences remains challenging. Here, we develop a novel algorithm, named Fastcov, to identify multiple correlated changes in biological sequences using an independent pair model followed by a tandem model of site-residue elements based on inter-restriction thinking. Fastcov performed exceptionally well at harvesting co-pairs and detecting multiple covariant patterns. By 10-fold cross-validation using datasets of different scales, the characteristic patterns successfully classified the sequences into target groups with an accuracy of greater than 98%. Moreover, we demonstrated that the multiple covariant patterns represent co-evolutionary modes corresponding to the phylogenetic tree, and provide a new understanding of protein structural stability. In contrast to other methods, Fastcov provides not only a reliable and effective approach to identify covariant pairs but also more powerful functions, including multiple covariance detection and sequence classification, that are most useful for studying the point and compensatory mutations caused by natural selection, drug induction, environmental pressure, etc.
引用
收藏
页数:8
相关论文
共 50 条
  • [41] A novel covariance matrix based approach for detecting network anomalies
    Tavallaee, Mahbod
    Lu, Wei
    Iqbal, Shah Arif
    Ghorbani, Ali A.
    CNSR 2008: PROCEEDINGS OF THE 6TH ANNUAL COMMUNICATION NETWORKS AND SERVICES RESEARCH CONFERENCE, 2008, : 75 - 81
  • [42] A Novel Efficient Mining Algorithm For Frequent Patterns On Biological Multiple Sequence
    Liu, Wei
    Chen, Ling
    FRONTIERS OF MANUFACTURING AND DESIGN SCIENCE, PTS 1-4, 2011, 44-47 : 3697 - +
  • [43] A sequential outlier detecting method using a clustering algorithm
    Seo, Han Son
    Yoon, Min
    KOREAN JOURNAL OF APPLIED STATISTICS, 2016, 29 (04) : 699 - 706
  • [44] A Fast Shot Transition Detecting Algorithm on MPEG Sequences
    Zheng Peng
    Department of Compuer Science
    Wuhan University Journal of Natural Sciences, 2003, (02) : 358 - 362
  • [45] Concerning an adaptive algorithm for detecting disharmonies in random sequences
    Bodyanskii, EV
    Rudneva, IA
    AUTOMATION AND REMOTE CONTROL, 1995, 56 (10) : 1439 - 1443
  • [46] Detecting DDoS attack based on PSO Clustering algorithm
    Hao, Xiaohong
    Meng, Boyu
    Gu, Kaicheng
    PROCEEDINGS OF THE 2016 3RD INTERNATIONAL CONFERENCE ON MATERIALS ENGINEERING, MANUFACTURING TECHNOLOGY AND CONTROL, 2016, 67 : 670 - 674
  • [47] A fast algorithm for detecting frame shifts in DNA sequences
    Masoom, Hassan
    Datta, Suprakash
    Asif, Amir
    Cunningham, Lesley
    Wu, Gillian
    PROCEEDINGS OF THE 2006 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 2006, : 451 - +
  • [48] Novel algorithm for simultaneously detecting multiple vapor materials with multiple-wavelength differential absorption lidar
    Yin, Shirong
    Wang, Weiran
    Chinese Optics Letters, 2006, 4 (06) : 360 - 362
  • [49] Novel algorithm for simultaneously detecting multiple vapor materials with multiple-wavelength differential absorption lidar
    尹世荣
    王蔚然
    ChineseOpticsLetters, 2006, (06) : 360 - 362
  • [50] Hard and soft clustering of categorical time series based on two novel distances with an application to biological sequences
    Lopez-Oriona, Angel
    Vilar, Jose A.
    D'Urso, Pierpaolo
    INFORMATION SCIENCES, 2023, 624 : 467 - 492