Exploring the extremes of sequence/structure space with ensemble fold recognition in the program Phyre

被引:345
|
作者
Bennett-Lovsey, Riccardo M. [1 ]
Herbert, Alex D. [1 ]
Sternberg, Michael J. E. [1 ]
Kelley, Lawrence A. [1 ]
机构
[1] Univ London Imperial Coll Sci Technol & Med, Dept Mol Biosci, Struct Bioinformat Grp, Div Mol Biosci, London SW7 2AY, England
关键词
meta-server; remote homology modelling; fold recognition; protein structure prediction; Phyre; ensemble; profile-profile alignment;
D O I
10.1002/prot.21688
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Structural and functional annotation of the large and growing database of genomic sequences is a major problem in modern biology. Protein structure prediction by detecting remote homology to known structures is a well-established and successful annotation technique. However, the broad spectrum of evolutionary change that accompanies the divergence of close homologues to become remote homologues cannot easily be captured with a single algorithm. Recent advances to tackle this problem have involved the, use of multiple predictive algorithms available on the Internet. Here we demonstrate how such ensembles of predictors can be designed in-house under controlled conditions and permit significant improvements in recognition by using a concept taken from protein loop energetics and applying it to the general problem of 3D clustering. We have developed a stringent test that simulates the situation where a protein sequence of interest is submitted to multiple different algorithms and not one of these algorithms can make a confident (95%) correct assignment. A method of meta-server prediction (Phyre) that exploits the benefits of a controlled environment for the component methods was implemented. At 95% precision or higher, Phyre identified 64.0% of all correct homologous query-template relationships, and 84.0% of the individual test query proteins could be accurately annotated. In comparison to the improvement that the single best fold recognition algorithm (according to training) has over PSI-Blast, this represents a 29.6% increase in the number of correct homologous query-template relationships, and a 46.2% increase in the number of accurately annotated queries. It has been well recognised in fold prediction, other bioinformatics applications, and in many other areas, that ensemble predictions generally are superior in accuracy to any of the component individual methods. However there is a paucity of information as to why the ensemble methods are superior and indeed this has never been systematically addressed in fold recognition. Here we show that the source Of ensemble power stems from noise reduction in filtering out false positive matches. The results indicate greater coverage of sequence space and improved model quality, which can consequently lead to a reduction in the experimental workload of structural genomics initiatives.
引用
收藏
页码:611 / 625
页数:15
相关论文
共 23 条
  • [1] Exploring the Uncertainty Space of Ensemble Classifiers in Face Recognition
    Luis Fernandez-Martinez, Juan
    Cernea, Ana
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2015, 29 (03)
  • [2] Fold recognition using sequence and secondary structure information
    Koretke, KK
    Russell, RB
    Copley, RR
    Lupas, AN
    PROTEINS-STRUCTURE FUNCTION AND GENETICS, 1999, : 141 - 148
  • [3] DPANN: Improved sequence to structure alignments following fold recognition
    Reinhardt, A
    Eisenberg, D
    PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2004, 56 (03) : 528 - 538
  • [4] Fast protein fold recognition and accurate sequence-structure alignment
    Zimmer, R
    Thiele, R
    BIOINFORMATICS, 1997, 1278 : 137 - 146
  • [5] Exploring Antibody Recognition of Sequence Space through Random-Sequence Peptide Microarrays
    Halperin, Rebecca F.
    Stafford, Phillip
    Johnston, Stephen Albert
    MOLECULAR & CELLULAR PROTEOMICS, 2011, 10 (03)
  • [6] Structure-based evaluation of sequence comparison and fold recognition alignment accuracy
    Domingues, FS
    Lackner, P
    Andreeva, A
    Sippl, MJ
    JOURNAL OF MOLECULAR BIOLOGY, 2000, 297 (04) : 1003 - 1013
  • [7] Improving protein fold recognition with hybrid profiles combining sequence and structure evolution
    Ghouzam, Yassine
    Postic, Guillaume
    de Brevern, Alexandre G.
    Gelly, Jean-Christophe
    BIOINFORMATICS, 2015, 31 (23) : 3782 - 3789
  • [8] TertProt: A Protein Fold Recognition Method Using Protein Secondary Structure Program
    Kaladhar, D. S. V. G. K.
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INFORMATION SYSTEMS DESIGN AND INTELLIGENT APPLICATIONS 2012 (INDIA 2012), 2012, 132 : 161 - 168
  • [9] Exploring sequence space by structure-guided recombination.
    Silberg, JJ
    Meyer, MM
    Otey, CR
    Arnold, FH
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2003, 226 : U283 - U283
  • [10] Fold recognition and accurate sequence-structure alignment of sequences directing β-sheet proteins
    McDonnell, Andrew V.
    Menke, Matthew
    Palmer, Nathan
    King, Jonathan
    Cowen, Lenore
    Berger, Bonnie
    PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2006, 63 (04) : 976 - 985