Mathematical model for empirically optimizing large scale production of soluble protein domains

被引:6
作者
Chikayama, Eisuke [1 ]
Kurotani, Atsushi [1 ]
Tanaka, Takanori [1 ]
Yabuki, Takashi [1 ]
Miyazaki, Satoshi [1 ,2 ]
Yokoyama, Shigeyuki [1 ,2 ]
Kuroda, Yutaka [3 ]
机构
[1] RIKEN, Genom Sci Ctr, Tsurumi Ku, Yokohama, Kanagawa 2300045, Japan
[2] Univ Tokyo, Grad Sch Sci, Dept Biophys & Biochem, Bunkyo Ku, Tokyo 1130033, Japan
[3] Tokyo Univ Agr & Technol, Fac Technol, Dept Biotechnol & Life Sci, Tokyo 1840012, Japan
关键词
STRUCTURAL GENOMICS; PREDICTION; IDENTIFICATION; DATABASE; PROTEOLYSIS; SEQUENCES; TOPOLOGY; SEARCH;
D O I
10.1186/1471-2105-11-113
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Efficient dissection of large proteins into their structural domains is critical for high throughput proteome analysis. So far, no study has focused on mathematically modeling a protein dissection protocol in terms of a production system. Here, we report a mathematical model for empirically optimizing the cost of large-scale domain production in proteomics research. Results: The model computes the expected number of successfully producing soluble domains, using a conditional probability between domain and boundary identification. Typical values for the model's parameters were estimated using the experimental results for identifying soluble domains from the 2,032 Kazusa HUGE protein sequences. Among the 215 fragments corresponding to the 24 domains that were expressed correctly, 111, corresponding to 18 domains, were soluble. Our model indicates that, under the conditions used in our pilot experiment, the probability of correctly predicting the existence of a domain was 81% (175/215) and that of predicting its boundary was 63% (111/175). Under these conditions, the most cost/effort-effective production of soluble domains was to prepare one to seven fragments per predicted domain. Conclusions: Our mathematical modeling of protein dissection protocols indicates that the optimum number of fragments tested per domain is actually much smaller than expected a priori. The application range of our model is not limited to protein dissection, and it can be utilized for designing various large-scale mutational analyses or screening libraries.
引用
收藏
页数:9
相关论文
共 30 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   The Protein Data Bank [J].
Berman, HM ;
Westbrook, J ;
Feng, Z ;
Gilliland, G ;
Bhat, TN ;
Weissig, H ;
Shindyalov, IN ;
Bourne, PE .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :235-242
[3]   In vivo and in vitro protein solubility assays using split GFP [J].
Cabantous, Stephanie ;
Waldo, Geoffrey S. .
NATURE METHODS, 2006, 3 (10) :845-854
[4]   Identification and optimization of protein domains for NMR studies [J].
Card, PB ;
Gardner, KH .
NUCLEAR MAGNETIC RESONANCE OF BIOLOGICAL MACROMOLECULES, PART C, 2005, 394 :3-+
[5]   The impact of structural genomics: Expectations and outcomes [J].
Chandonia, JM ;
Brenner, SE .
SCIENCE, 2006, 311 (5759) :347-351
[6]   ProteoMix: an integrated and flexible system for interactively analyzing large numbers of protein sequences [J].
Chikayama, E ;
Kurotani, A ;
Kuroda, Y ;
Yokoyama, S .
BIOINFORMATICS, 2004, 20 (16) :2836-2838
[7]   Identification of protein domains by shotgun proteolysis [J].
Christ, D ;
Winter, G .
JOURNAL OF MOLECULAR BIOLOGY, 2006, 358 (02) :364-371
[8]   ProDom and ProDom-CG: tools for protein domain analysis and whole genome comparisons [J].
Corpet, F ;
Servant, F ;
Gouzy, J ;
Kahn, D .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :267-269
[9]   PSI-2: Structural Genomics to Cover Protein Domain Family Space [J].
Dessailly, Benoit H. ;
Nair, Rajesh ;
Jaroszewski, Lukasz ;
Fajardo, J. Eduardo ;
Kouranov, Andrei ;
Lee, David ;
Fiser, Andras ;
Godzik, Adam ;
Rost, Burkhard ;
Orengo, Christine .
STRUCTURE, 2009, 17 (06) :869-881
[10]   Protease accessibility laddering: A proteomic tool for probing protein structure [J].
Dokudovskaya, S ;
Williams, R ;
Devos, D ;
Sali, A ;
Chait, BT ;
Rout, MP .
STRUCTURE, 2006, 14 (04) :653-660