Effect of training datasets on support vector machine prediction of protein-protein interactions

被引:62
|
作者
Lo, SL
Cai, CZ
Chen, YZ
Chung, MCM
机构
[1] Natl Univ Singapore, Dept Biochem, Singapore 117597, Singapore
[2] Natl Univ Singapore, Bioproc Technol Inst, Singapore 117597, Singapore
[3] Natl Univ Singapore, Dept Computat Sci, Singapore 117597, Singapore
[4] Natl Univ Singapore, Dept Biochem, Singapore 117597, Singapore
关键词
database of interacting proteins; protein function prediction; protein-protein interaction; shuffled sequence; support vector machine; SVMlight;
D O I
10.1002/pmic.200401118
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Knowledge of protein-protein interaction is useful for elucidating protein function via the concept of 'guilt-by-association'. A statistical learning method, Support Vector Machine (SVM), has recently been explored for the prediction of protein-protein interactions using artificial shuffled sequences as hypothetical noninteracting proteins and it has shown promising results (Bock, J. R., Gough, D. A., Bioinformatics 2001, 17, 455-460). It remains unclear however, how the prediction accuracy is affected if real protein sequences are used to represent noninteracting proteins. In this work, this effect is assessed by comparison of the results derived from the use of real protein sequences with that derived from the use of shuffled sequences. The real protein sequences of hypothetical noninteracting proteins are generated from an exclusion analysis in combination with subcellular localization information of interacting proteins found in the Database of Interacting Proteins. Prediction accuracy using real protein sequences is 76.9% compared to 94.1% using artificial shuffled sequences. The discrepancy likely arises from the expected higher level of difficulty for separating two sets of real protein sequences than that for separating a set of real protein sequences from a set of artificial sequences. The use of real protein sequences for training a SVM classification system is expected to give better prediction results in practical cases. This is tested by using both SVM systems for predicting putative protein partners of a set of thioredoxin related proteins. The prediction results are consistent with observations, suggesting that real sequence is more practically useful in development of SVM classification system for facilitating protein-protein interaction prediction.
引用
收藏
页码:876 / 884
页数:9
相关论文
共 50 条
  • [21] Determining Protein-Protein Interaction Using Support Vector Machine: A Review
    Chakraborty, Arijit
    Mitra, Sajal
    De, Debashis
    Pal, Anindya Jyoti
    Ghaemi, Ferial
    Ahmadian, Ali
    Ferrara, Massimiliano
    IEEE ACCESS, 2021, 9 : 12473 - 12490
  • [22] Predicting protein-protein binding sites by a support vector machine approach
    Ou, Rui
    Zhang, Juhua
    2007 IEEE/ICME INTERNATIONAL CONFERENCE ON COMPLEX MEDICAL ENGINEERING, VOLS 1-4, 2007, : 1621 - 1625
  • [23] Effect of example weights on prediction of protein-protein interactions
    Li, Ming-Hui
    Wang, Xiao-Long
    Lin, Lei
    Liu, Tao
    COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2006, 30 (05) : 386 - 392
  • [24] Prediction of protein-protein interaction sites using support vector machines
    Minakuchi, Y
    Satou, K
    Konagaya, A
    METMBS'03: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON MATHEMATICS AND ENGINEERING TECHNIQUES IN MEDICINE AND BIOLOGICAL SCIENCES, 2003, : 22 - 28
  • [25] Prediction of protein-protein interaction sites using support vector machines
    Koike, A
    Takagi, T
    PROTEIN ENGINEERING DESIGN & SELECTION, 2004, 17 (02): : 165 - 173
  • [26] Prediction of Protein Thermostability with Support Vector Machine
    Ai, Haixin
    Zhang, Jikuan
    Zhang, Li
    Deng, Fangbo
    Zhao, Jian
    Liu, Hongsheng
    8TH INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICAL ENGINEERING (ICBBE 2014), 2014, : 63 - 68
  • [27] Prediction of Protein-protein Interactions in Arabidopsis thaliana Using Partial Training Samples in a Machine Learning Framework
    Ahmed, Fee Faysal
    Khatun, Mst Shamima
    Mosharaf, Md Parvez
    Mollah, Md Nurul Haque
    CURRENT BIOINFORMATICS, 2021, 16 (06) : 865 - 879
  • [28] Predicting protein-protein interaction sites using modified support vector machine
    Guo, Hong
    Liu, Bingjing
    Cai, Danli
    Lu, Tun
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2018, 9 (03) : 393 - 398
  • [29] TRANSDUCTIVE SUPPORT VECTOR MACHINES AND ACTIVE LEARNING FOR EXTRACTING PROTEIN-PROTEIN INTERACTIONS
    Wang, Haochang
    Li, Yu
    JOURNAL OF INVESTIGATIVE MEDICINE, 2013, 61 (04) : S19 - S20
  • [30] Computational prediction of protein-protein interactions
    Skrabanek, Lucy
    Saini, Harpreet K.
    Bader, Gary D.
    Enright, Anton J.
    MOLECULAR BIOTECHNOLOGY, 2008, 38 (01) : 1 - 17