Accurate prediction of protein functional class from sequence in the Mycobacterium tuberculosis and Escherichia coli genomes using data mining

被引:2
|
作者
King, RD [1 ]
Karwath, A
Clare, A
Dehaspe, L
机构
[1] Univ Coll Wales, Dept Comp Sci, Aberystwyth SY23 3DB, Dyfed, Wales
[2] PharmaDM, B-3001 Louvain, Belgium
关键词
machine learning; clustering; ILP; bioinformatics;
D O I
10.1002/1097-0061(200012)17:4<283::AID-YEA52>3.0.CO;2-F
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The analysis of genomics data needs to become as automated as its generation. Here we present a novel data-mining approach to predicting protein functional class from sequence. This method is based on a combination of inductive logic programming clustering and rule learning. We demonstrate the effectiveness of this approach on the M, tuberculosis and E. coli genomes, and identify biologically interpretable rules which predict protein functional class from information only available from the sequence. These rules predict 65% of the ORFs with no assigned function in M, tuberculosis and 24% of those in E, coli, with an estimated accuracy of 60-80% (depending on the level of functional assignment). The rules are founded on a combination of detection of remote homology, convergent evolution and horizontal gene transfer. We identify rules that predict protein functional class even in the absence of detectable sequence or structural homology, These rules give insight into the evolutionary history of M. tuberculosis and E, coli, Copyright (C) 2000 John Wiley & Sons, Ltd.
引用
收藏
页码:283 / 293
页数:11
相关论文
共 39 条
  • [21] Prediction of soluble heterologous protein expression levels in Escherichia coli from sequence-based features and its potential in biopharmaceutical process development
    Dai, XiaoFeng
    Guo, Wenwen
    Long, Quan
    Yang, Yankun
    Harvey, Linda
    McNeil, Brian
    Bai, Zhonghu
    PHARMACEUTICAL BIOPROCESSING, 2014, 2 (03) : 253 - 266
  • [22] Characterization of quinolinate synthases from Escherichia coli, Mycobacterium tuberculosis, and Pyrococcus horikoshii indicates that [4Fe-4S] clusters are common cofactors throughout this class of enzymes
    Saunders, Allison H.
    Griffiths, Amy E.
    Lee, Kyung-Hoon
    Cicchillo, Robert M.
    Tu, Loretta
    Stromberg, Jeffrey A.
    Krebs, Carsten
    Booker, Squire J.
    BIOCHEMISTRY, 2008, 47 (41) : 10999 - 11012
  • [23] Improving protein complex prediction by reconstructing a high-confidence protein-protein interaction network of Escherichia coli from different physical interaction data sources
    Taghipour, Shirin
    Zarrineh, Peyman
    Ganjtabesh, Mohammad
    Nowzari-Dalini, Abbas
    BMC BIOINFORMATICS, 2017, 18
  • [24] Improving protein complex prediction by reconstructing a high-confidence protein-protein interaction network of Escherichia coli from different physical interaction data sources
    Shirin Taghipour
    Peyman Zarrineh
    Mohammad Ganjtabesh
    Abbas Nowzari-Dalini
    BMC Bioinformatics, 18
  • [25] Predicting Protein Stability Change upon Double Mutation from Partial Sequence Information Using Data Mining Approach
    Lai, Lien-Fu
    Wu, Chao-Chin
    Huang, Liang-Tsung
    ADVANCED INTELLIGENT COMPUTING THEORIES AND APPLICATIONS, 2010, 6215 : 664 - +
  • [26] D-ribose-5-phosphate isomerase B from Escherichia coli is also a functional D-allose-6-phosphate isomerase, while the Mycobacterium tuberculosis enzyme is not
    Roos, Annette K.
    Mariano, Sandrine
    Kowalinski, Eva
    Salmon, Laurent
    Mowbray, Sherry L.
    JOURNAL OF MOLECULAR BIOLOGY, 2008, 382 (03) : 667 - 679
  • [27] Functional insights into the late embryogenesis abundant (LEA) protein family from Dendrobium officinale (Orchidaceae) using an Escherichia coli system
    Ling, Hong
    Zeng, Xu
    Guo, Shunxing
    SCIENTIFIC REPORTS, 2016, 6
  • [28] Functional insights into the late embryogenesis abundant (LEA) protein family from Dendrobium officinale (Orchidaceae) using an Escherichia coli system
    Hong Ling
    Xu Zeng
    Shunxing Guo
    Scientific Reports, 6
  • [29] Predicting the functional state of protein kinases using interpretable graph neural networks from sequence and structural data
    Ravichandran, Ashwin
    Araque, Juan C.
    Lawson, John W.
    PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2024, 92 (05) : 623 - 636
  • [30] Mycobacterium tuberculosis cAMP Receptor Protein (Rv3676) Differs from the Escherichia coli Paradigm in Its cAMP Binding and DNA Binding Properties and Transcription Activation Properties
    Stapleton, Melanie
    Haq, Ihtshamul
    Hunt, Debbie M.
    Arnvig, Kristine B.
    Artymiuk, Peter J.
    Buxton, Roger S.
    Green, Jeffrey
    JOURNAL OF BIOLOGICAL CHEMISTRY, 2010, 285 (10) : 7016 - 7027