The analysis of genomics data needs to become as automated as its generation. Here we present a novel data-mining approach to predicting protein functional class from sequence. This method is based on a combination of inductive logic programming clustering and rule learning. We demonstrate the effectiveness of this approach on the M, tuberculosis and E. coli genomes, and identify biologically interpretable rules which predict protein functional class from information only available from the sequence. These rules predict 65% of the ORFs with no assigned function in M, tuberculosis and 24% of those in E, coli, with an estimated accuracy of 60-80% (depending on the level of functional assignment). The rules are founded on a combination of detection of remote homology, convergent evolution and horizontal gene transfer. We identify rules that predict protein functional class even in the absence of detectable sequence or structural homology, These rules give insight into the evolutionary history of M. tuberculosis and E, coli, Copyright (C) 2000 John Wiley & Sons, Ltd.
机构:
Jiang Nan Univ, Sch Biotechnol, Wuxi 214122, Peoples R China
Jiang Nan Univ, Natl Engn Lab Cereal Fermentat Technol, Wuxi 214122, Peoples R ChinaJiang Nan Univ, Sch Biotechnol, Wuxi 214122, Peoples R China
Dai, XiaoFeng
Guo, Wenwen
论文数: 0引用数: 0
h-index: 0
机构:
Jiang Nan Univ, Sch Biotechnol, Wuxi 214122, Peoples R China
Jiang Nan Univ, Natl Engn Lab Cereal Fermentat Technol, Wuxi 214122, Peoples R ChinaJiang Nan Univ, Sch Biotechnol, Wuxi 214122, Peoples R China
Guo, Wenwen
Long, Quan
论文数: 0引用数: 0
h-index: 0
机构:
Jiang Nan Univ, Sch Biotechnol, Wuxi 214122, Peoples R China
Jiang Nan Univ, Natl Engn Lab Cereal Fermentat Technol, Wuxi 214122, Peoples R ChinaJiang Nan Univ, Sch Biotechnol, Wuxi 214122, Peoples R China
Long, Quan
Yang, Yankun
论文数: 0引用数: 0
h-index: 0
机构:
Jiang Nan Univ, Sch Biotechnol, Wuxi 214122, Peoples R China
Jiang Nan Univ, Natl Engn Lab Cereal Fermentat Technol, Wuxi 214122, Peoples R ChinaJiang Nan Univ, Sch Biotechnol, Wuxi 214122, Peoples R China
Yang, Yankun
Harvey, Linda
论文数: 0引用数: 0
h-index: 0
机构:
Univ Strathclyde, Inst Pharm & Biomed Sci, Glasgow G1 1XQ, Lanark, ScotlandJiang Nan Univ, Sch Biotechnol, Wuxi 214122, Peoples R China
Harvey, Linda
McNeil, Brian
论文数: 0引用数: 0
h-index: 0
机构:
Jiang Nan Univ, Sch Biotechnol, Wuxi 214122, Peoples R China
Univ Strathclyde, Inst Pharm & Biomed Sci, Glasgow G1 1XQ, Lanark, ScotlandJiang Nan Univ, Sch Biotechnol, Wuxi 214122, Peoples R China
McNeil, Brian
Bai, Zhonghu
论文数: 0引用数: 0
h-index: 0
机构:
Jiang Nan Univ, Sch Biotechnol, Wuxi 214122, Peoples R China
Jiang Nan Univ, Natl Engn Lab Cereal Fermentat Technol, Wuxi 214122, Peoples R ChinaJiang Nan Univ, Sch Biotechnol, Wuxi 214122, Peoples R China
机构:
Penn State Univ, Dept Biochem & Mol Biol, University Pk, PA 16802 USAPenn State Univ, Dept Biochem & Mol Biol, University Pk, PA 16802 USA
Saunders, Allison H.
Griffiths, Amy E.
论文数: 0引用数: 0
h-index: 0
机构:
Penn State Univ, Dept Biochem & Mol Biol, University Pk, PA 16802 USAPenn State Univ, Dept Biochem & Mol Biol, University Pk, PA 16802 USA
Griffiths, Amy E.
Lee, Kyung-Hoon
论文数: 0引用数: 0
h-index: 0
机构:
Penn State Univ, Dept Biochem & Mol Biol, University Pk, PA 16802 USAPenn State Univ, Dept Biochem & Mol Biol, University Pk, PA 16802 USA
Lee, Kyung-Hoon
Cicchillo, Robert M.
论文数: 0引用数: 0
h-index: 0
机构:
Penn State Univ, Dept Biochem & Mol Biol, University Pk, PA 16802 USAPenn State Univ, Dept Biochem & Mol Biol, University Pk, PA 16802 USA
Cicchillo, Robert M.
Tu, Loretta
论文数: 0引用数: 0
h-index: 0
机构:
Penn State Univ, Dept Biochem & Mol Biol, University Pk, PA 16802 USAPenn State Univ, Dept Biochem & Mol Biol, University Pk, PA 16802 USA
Tu, Loretta
Stromberg, Jeffrey A.
论文数: 0引用数: 0
h-index: 0
机构:
Penn State Univ, Dept Biochem & Mol Biol, University Pk, PA 16802 USAPenn State Univ, Dept Biochem & Mol Biol, University Pk, PA 16802 USA
Stromberg, Jeffrey A.
Krebs, Carsten
论文数: 0引用数: 0
h-index: 0
机构:
Penn State Univ, Dept Biochem & Mol Biol, University Pk, PA 16802 USA
Penn State Univ, Dept Chem, University Pk, PA 16802 USAPenn State Univ, Dept Biochem & Mol Biol, University Pk, PA 16802 USA
Krebs, Carsten
Booker, Squire J.
论文数: 0引用数: 0
h-index: 0
机构:
Penn State Univ, Dept Biochem & Mol Biol, University Pk, PA 16802 USA
Penn State Univ, Dept Chem, University Pk, PA 16802 USAPenn State Univ, Dept Biochem & Mol Biol, University Pk, PA 16802 USA
机构:
Univ Tehran, Sch Math Stat & Comp Sci, Dept Comp Sci, POB 14155-6455, Tehran, IranUniv Tehran, Sch Math Stat & Comp Sci, Dept Comp Sci, POB 14155-6455, Tehran, Iran
Taghipour, Shirin
Zarrineh, Peyman
论文数: 0引用数: 0
h-index: 0
机构:
Univ Tehran, Sch Math Stat & Comp Sci, Dept Comp Sci, POB 14155-6455, Tehran, IranUniv Tehran, Sch Math Stat & Comp Sci, Dept Comp Sci, POB 14155-6455, Tehran, Iran
Zarrineh, Peyman
Ganjtabesh, Mohammad
论文数: 0引用数: 0
h-index: 0
机构:
Univ Tehran, Sch Math Stat & Comp Sci, Dept Comp Sci, POB 14155-6455, Tehran, IranUniv Tehran, Sch Math Stat & Comp Sci, Dept Comp Sci, POB 14155-6455, Tehran, Iran
Ganjtabesh, Mohammad
Nowzari-Dalini, Abbas
论文数: 0引用数: 0
h-index: 0
机构:
Univ Tehran, Sch Math Stat & Comp Sci, Dept Comp Sci, POB 14155-6455, Tehran, IranUniv Tehran, Sch Math Stat & Comp Sci, Dept Comp Sci, POB 14155-6455, Tehran, Iran
机构:
NASA Ames Res Ctr, KBR Inc, Intelligent Syst Div, Moffett Field, CA USA
NASA Ames Res Ctr, KBR Inc, Intelligent Syst Div, Moffett Field, CA 94035 USANASA Ames Res Ctr, KBR Inc, Intelligent Syst Div, Moffett Field, CA USA
Ravichandran, Ashwin
Araque, Juan C.
论文数: 0引用数: 0
h-index: 0
机构:
NASA Ames Res Ctr, KBR Inc, Intelligent Syst Div, Moffett Field, CA USA
Janssen Pharmaceut Co Johnson & Johnson, Adv Technol Ctr Excellence, Titusville, NJ USANASA Ames Res Ctr, KBR Inc, Intelligent Syst Div, Moffett Field, CA USA
Araque, Juan C.
Lawson, John W.
论文数: 0引用数: 0
h-index: 0
机构:
NASA Ames Res Ctr, Intelligent Syst Div, Moffett Field, CA USA
NASA Ames Res Ctr, Intelligent Syst Div, Moffett Field, CA 94035 USANASA Ames Res Ctr, KBR Inc, Intelligent Syst Div, Moffett Field, CA USA
机构:
Univ Sheffield, Dept Mol Biol & Biotechnol, Sheffield S10 2TN, S Yorkshire, EnglandUniv Sheffield, Dept Mol Biol & Biotechnol, Sheffield S10 2TN, S Yorkshire, England
Stapleton, Melanie
Haq, Ihtshamul
论文数: 0引用数: 0
h-index: 0
机构:
Univ Sheffield, Dept Chem, Sheffield S3 7HF, S Yorkshire, EnglandUniv Sheffield, Dept Mol Biol & Biotechnol, Sheffield S10 2TN, S Yorkshire, England
Haq, Ihtshamul
Hunt, Debbie M.
论文数: 0引用数: 0
h-index: 0
机构:
Natl Inst Med Res, MRC, Div Mycobacterial Res, London NW7 1AA, EnglandUniv Sheffield, Dept Mol Biol & Biotechnol, Sheffield S10 2TN, S Yorkshire, England
Hunt, Debbie M.
Arnvig, Kristine B.
论文数: 0引用数: 0
h-index: 0
机构:
Natl Inst Med Res, MRC, Div Mycobacterial Res, London NW7 1AA, EnglandUniv Sheffield, Dept Mol Biol & Biotechnol, Sheffield S10 2TN, S Yorkshire, England
Arnvig, Kristine B.
Artymiuk, Peter J.
论文数: 0引用数: 0
h-index: 0
机构:
Univ Sheffield, Dept Mol Biol & Biotechnol, Sheffield S10 2TN, S Yorkshire, EnglandUniv Sheffield, Dept Mol Biol & Biotechnol, Sheffield S10 2TN, S Yorkshire, England
Artymiuk, Peter J.
Buxton, Roger S.
论文数: 0引用数: 0
h-index: 0
机构:
Natl Inst Med Res, MRC, Div Mycobacterial Res, London NW7 1AA, EnglandUniv Sheffield, Dept Mol Biol & Biotechnol, Sheffield S10 2TN, S Yorkshire, England
Buxton, Roger S.
Green, Jeffrey
论文数: 0引用数: 0
h-index: 0
机构:
Univ Sheffield, Dept Mol Biol & Biotechnol, Sheffield S10 2TN, S Yorkshire, EnglandUniv Sheffield, Dept Mol Biol & Biotechnol, Sheffield S10 2TN, S Yorkshire, England