Functional analysis of the Escherichia coli genome using the sequence-to-structure-to-function paradigm:: Identification of proteins exhibiting the glutaredoxin/thioredoxin disulfide oxidoreductase activity

被引:78
作者
Fetrow, JS [1 ]
Godzik, A [1 ]
Skolnick, J [1 ]
机构
[1] Scripps Res Inst, Dept Mol Biol, La Jolla, CA 92037 USA
关键词
genome sequencing project; protein function prediction; geometric descriptors; complete genome screening; functional genomics;
D O I
10.1006/jmbi.1998.2061
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The application of an automated method for the screening of protein activity based on the sequence-to-structure-to-function paradigm is presented for the complete Escherichia coli genome. First, the structure of the protein is identified from its sequence using a threading algorithm, which aligns the sequences to the best matching structure in a structural database and extends sequence analysis well beyond the limits of local sequence identity. Then, the active site is identified in the resulting sequence-to-structure alignment using a "fuzzy functional form" (FFF), a three-dimensional descriptor of the active site of a protein. Here, this sequence-to-structure-to-function concept is applied to analysis of the complete E. coli genome, i.e. all E. coli open reading frames (ORFs) are screened for the thiol-disulfide oxidoreductase activity of the glutaredoxin/thioredoxin protein family. We show that the method can identify the active sites in ten sequences that are known to or proposed to exhibit this activity. Furthermore, oxidoreductase activity is predicted in two other sequences that have not been identified previously. This method distinguishes protein pairs with similar active sites from proteins pairs that are just topological cousins, i.e. those having similar global folds, but not necessarily similar active sites. Thus, this method provides a novel approach for extraction of active site and functional information based on three-dimensional structures, rather than simple sequence analysis. Prediction of protein activity is fully automated and easily extendible to new functions. Finally, it is demonstrated here that the method can be applied to complete genome database analysis. (C) 1998 Academic Press.
引用
收藏
页码:703 / 711
页数:9
相关论文
共 49 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[3]  
[Anonymous], 1996, ESCHERICHIA COLI SAL
[4]  
ATTWOOD TK, 1994, NUCLEIC ACIDS RES, V22, P3590
[5]   PRINTS - A PROTEIN MOTIF FINGERPRINT DATABASE [J].
ATTWOOD, TK ;
BECK, ME .
PROTEIN ENGINEERING, 1994, 7 (07) :841-848
[6]  
ATTWOOD TK, 1994, NUCLEIC ACIDS RES, V25, P212
[7]   The SWISS-PROT protein sequence data bank and its new supplement TREMBL [J].
Bairoch, A ;
Apweiler, R .
NUCLEIC ACIDS RESEARCH, 1996, 24 (01) :21-25
[8]   The PROSITE database, its status in 1995 [J].
Bairoch, A ;
Bucher, P ;
Hofmann, K .
NUCLEIC ACIDS RESEARCH, 1996, 24 (01) :189-196
[9]   The complete genome sequence of Escherichia coli K-12 [J].
Blattner, FR ;
Plunkett, G ;
Bloch, CA ;
Perna, NT ;
Burland, V ;
Riley, M ;
ColladoVides, J ;
Glasner, JD ;
Rode, CK ;
Mayhew, GF ;
Gregor, J ;
Davis, NW ;
Kirkpatrick, HA ;
Goeden, MA ;
Rose, DJ ;
Mau, B ;
Shao, Y .
SCIENCE, 1997, 277 (5331) :1453-+
[10]   Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii [J].
Bult, CJ ;
White, O ;
Olsen, GJ ;
Zhou, LX ;
Fleischmann, RD ;
Sutton, GG ;
Blake, JA ;
FitzGerald, LM ;
Clayton, RA ;
Gocayne, JD ;
Kerlavage, AR ;
Dougherty, BA ;
Tomb, JF ;
Adams, MD ;
Reich, CI ;
Overbeek, R ;
Kirkness, EF ;
Weinstock, KG ;
Merrick, JM ;
Glodek, A ;
Scott, JL ;
Geoghagen, NSM ;
Weidman, JF ;
Fuhrmann, JL ;
Nguyen, D ;
Utterback, TR ;
Kelley, JM ;
Peterson, JD ;
Sadow, PW ;
Hanna, MC ;
Cotton, MD ;
Roberts, KM ;
Hurst, MA ;
Kaine, BP ;
Borodovsky, M ;
Klenk, HP ;
Fraser, CM ;
Smith, HO ;
Woese, CR ;
Venter, JC .
SCIENCE, 1996, 273 (5278) :1058-1073