SELECTION OF A REPRESENTATIVE SET OF STRUCTURES FROM BROOKHAVEN PROTEIN DATA-BANK

被引:52
作者
BOBERG, J
SALAKOSKI, T
VIHINEN, M
机构
[1] UNIV TURKU,DEPT BIOCHEM,SF-20500 TURKU 50,FINLAND
[2] UNIV TURKU,CTR BIOTECHNOL,SF-20500 TURKU 50,FINLAND
[3] UNIV TURKU,DEPT COMP SCI,SF-20520 TURKU 52,FINLAND
来源
PROTEINS-STRUCTURE FUNCTION AND GENETICS | 1992年 / 14卷 / 02期
关键词
REPRESENTATIVE PDB STRUCTURES; SEQUENCE CLUSTERING; SIGNIFICANCE OF SEQUENCE SIMILARITY; CLASSIFICATION OF PROTEIN STRUCTURES; AMINO ACID COMPOSITION;
D O I
10.1002/prot.340140212
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Reliable structural and statistical analyses of three dimensional protein structures should be based on unbiased data. The Protein Data Bank is highly redundant, containing several entries for identical or very similar sequences. A technique was developed for clustering the known structures based on their sequences and contents of alpha- and beta-structures. First, sequences were aligned pairwise. A representative sample of sequences was then obtained by grouping similar sequences together, and selecting a typical representative from each group. The similarity significance threshold needed in the clustering method was found by analyzing similarities of random sequences. Because three dimensional structures for proteins of same structural class are generally more conserved than their sequences, the proteins were clustered also according to their contents of secondary structural elements. The results of these clusterings indicate conservation of alpha- and beta-structures even when sequence similarity is relatively low. An unbiased sample of 103 high resolution structures, representing a wide variety of proteins, was chosen based on the suggestions made by the clustering algorithm. The proteins were divided into structural classes according to their contents and ratios of secondary structural elements. Previous classifications have suffered from subjectice view of secondary structures, whereas here the classification was based on backbone geometry. The concise view lead to reclassification of some structures. The representative set of structures facilitates unbiased analyses of relationships between protein sequence, function, and structure as well as of structural characteristics.
引用
收藏
页码:265 / 276
页数:12
相关论文
共 29 条
[1]   KNOWLEDGE-BASED PREDICTION OF PROTEIN STRUCTURES AND THE DESIGN OF NOVEL MOLECULES [J].
BLUNDELL, TL ;
SIBANDA, BL ;
STERNBERG, MJE ;
THORNTON, JM .
NATURE, 1987, 326 (6111) :347-352
[2]  
Chou P Y, 1978, Adv Enzymol Relat Areas Mol Biol, V47, P45
[3]   MODELING THE POLYPEPTIDE BACKBONE WITH SPARE PARTS FROM KNOWN PROTEIN STRUCTURES [J].
CLAESSENS, M ;
VANCUTSEM, E ;
LASTERS, I ;
WODAK, S .
PROTEIN ENGINEERING, 1989, 2 (05) :335-345
[4]  
CREIGHTON TE, 1984, PROTEINS STRUCTURES, P53
[5]  
Dayhoff H., 1978, ALTAS PROTEIN SEQUEN, V5, P363
[6]   A COMPREHENSIVE SET OF SEQUENCE-ANALYSIS PROGRAMS FOR THE VAX [J].
DEVEREUX, J ;
HAEBERLI, P ;
SMITHIES, O .
NUCLEIC ACIDS RESEARCH, 1984, 12 (01) :387-395
[7]  
DOOLITTLE RF, 1986, URFS ORFS PRIMER ANA, P14
[8]   ANALYSIS OF ACCURACY AND IMPLICATIONS OF SIMPLE METHODS FOR PREDICTING SECONDARY STRUCTURE OF GLOBULAR PROTEINS [J].
GARNIER, J ;
OSGUTHORPE, DJ ;
ROBSON, B .
JOURNAL OF MOLECULAR BIOLOGY, 1978, 120 (01) :97-120
[9]   STRUCTURES OF D-XYLOSE ISOMERASE FROM ARTHROBACTER STRAIN-B3728 CONTAINING THE INHIBITORS XYLITOL AND D-SORBITOL AT 2.5-A AND 2.3-A RESOLUTION, RESPECTIVELY [J].
HENRICK, K ;
COLLYER, CA ;
BLOW, DM .
JOURNAL OF MOLECULAR BIOLOGY, 1989, 208 (01) :129-157
[10]   SURFACE AND INSIDE VOLUMES IN GLOBULAR PROTEINS [J].
JANIN, J .
NATURE, 1979, 277 (5696) :491-492