Effect of training datasets on support vector machine prediction of protein-protein interactions

被引:62
|
作者
Lo, SL
Cai, CZ
Chen, YZ
Chung, MCM
机构
[1] Natl Univ Singapore, Dept Biochem, Singapore 117597, Singapore
[2] Natl Univ Singapore, Bioproc Technol Inst, Singapore 117597, Singapore
[3] Natl Univ Singapore, Dept Computat Sci, Singapore 117597, Singapore
[4] Natl Univ Singapore, Dept Biochem, Singapore 117597, Singapore
关键词
database of interacting proteins; protein function prediction; protein-protein interaction; shuffled sequence; support vector machine; SVMlight;
D O I
10.1002/pmic.200401118
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Knowledge of protein-protein interaction is useful for elucidating protein function via the concept of 'guilt-by-association'. A statistical learning method, Support Vector Machine (SVM), has recently been explored for the prediction of protein-protein interactions using artificial shuffled sequences as hypothetical noninteracting proteins and it has shown promising results (Bock, J. R., Gough, D. A., Bioinformatics 2001, 17, 455-460). It remains unclear however, how the prediction accuracy is affected if real protein sequences are used to represent noninteracting proteins. In this work, this effect is assessed by comparison of the results derived from the use of real protein sequences with that derived from the use of shuffled sequences. The real protein sequences of hypothetical noninteracting proteins are generated from an exclusion analysis in combination with subcellular localization information of interacting proteins found in the Database of Interacting Proteins. Prediction accuracy using real protein sequences is 76.9% compared to 94.1% using artificial shuffled sequences. The discrepancy likely arises from the expected higher level of difficulty for separating two sets of real protein sequences than that for separating a set of real protein sequences from a set of artificial sequences. The use of real protein sequences for training a SVM classification system is expected to give better prediction results in practical cases. This is tested by using both SVM systems for predicting putative protein partners of a set of thioredoxin related proteins. The prediction results are consistent with observations, suggesting that real sequence is more practically useful in development of SVM classification system for facilitating protein-protein interaction prediction.
引用
收藏
页码:876 / 884
页数:9
相关论文
共 50 条
  • [1] Prediction of Protein-Protein Interactions Based on Molecular Interface Features and the Support Vector Machine
    Zhou, Weiqiang
    Yan, Hong
    Fan, Xiaodan
    Hao, Quan
    CURRENT BIOINFORMATICS, 2013, 8 (01) : 3 - 8
  • [2] Prediction of protein-protein interactions through support vector machines
    Arango Rodriguez, J. D.
    Jaramillo-Garzon, J. A.
    Arroyave-Ospina, J. C.
    2015 20TH SYMPOSIUM ON SIGNAL PROCESSING, IMAGES AND COMPUTER VISION (STSIVA), 2015,
  • [3] Prediction of protein-protein interactions using support vector machines
    Dohkan, S
    Koike, A
    Takagi, T
    BIBE 2004: FOURTH IEEE SYMPOSIUM ON BIOINFORMATICS AND BIOENGINEERING, PROCEEDINGS, 2004, : 576 - 583
  • [4] Prediction of Protein-Protein Interaction with Pairwise Kernel Support Vector Machine
    Zhang, Shao-Wu
    Hao, Li-Yang
    Zhang, Ting-He
    INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2014, 15 (02): : 3220 - 3233
  • [5] PPI-Detect: A Support Vector Machine Model for Sequence-Based Prediction of Protein-Protein Interactions
    Romero-Molina, Sandra
    Ruiz-Blanco, Yasser B.
    Harms, Mirja
    Muench, Jan
    Sanchez-Garcia, Elsa
    JOURNAL OF COMPUTATIONAL CHEMISTRY, 2019, 40 (11) : 1233 - 1242
  • [6] Sequence-based protein-protein interaction prediction via support vector machine
    Yongcui Wang
    Jiguang Wang
    Zhixia Yang
    Naiyang Deng
    Journal of Systems Science and Complexity, 2010, 23 : 1012 - 1023
  • [7] Sequence-based protein-protein interaction prediction via support vector machine
    Wang, Yongcui
    Wang, Jiguang
    Yang, Zhixia
    Deng, Naiyang
    JOURNAL OF SYSTEMS SCIENCE & COMPLEXITY, 2010, 23 (05) : 1012 - 1023
  • [8] Prediction of Protein-Protein Interaction Sites by Using Autocorrelation Descriptor and Support Vector Machine
    Ren, Xiao-Ming
    Xia, Jun-Feng
    ADVANCED INTELLIGENT COMPUTING THEORIES AND APPLICATIONS: WITH ASPECTS OF ARTIFICIAL INTELLIGENCE, 2010, 6216 : 76 - 82
  • [9] Protein-Protein Recognition Prediction Using Support Vector Machine Based on Feature Vectors
    Kuo, Huang-Cheng
    Ong, Ping-Lin
    Lin, Jung-Chang
    Huang, Jen-Peng
    2008 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE WORKSHOPS, PROCEEDINGS, 2008, : 200 - +
  • [10] PreBIND and Textomy - mining the biomedical literature for protein-protein interactions using a support vector machine
    Donaldson, I
    Martin, J
    de Bruijn, B
    Wolting, C
    Lay, V
    Tuekam, B
    Zhang, SD
    Baskin, B
    Bader, GD
    Michalickova, K
    Pawson, T
    Hogue, CWV
    BMC BIOINFORMATICS, 2003, 4 (1)