Protein-protein interaction sites prediction by ensembling SVM and sample-weighted random forests

被引:93
|
作者
Wei, Zhi-Sen [1 ]
Han, Ke [1 ]
Yang, Jing-Yu [1 ]
Shen, Hong-Bin [2 ]
Yu, Dong-Jun [1 ]
机构
[1] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Xiaolingwei 200, Nanjing 210094, Jiangsu, Peoples R China
[2] Shanghai Jiao Tong Univ, Inst Image Proc & Pattern Recognit, Dongchuan Rd 800, Shanghai 200240, Peoples R China
基金
中国博士后科学基金; 中国国家自然科学基金;
关键词
Protein-protein interaction sites; Sequence-based prediction; Imbalanced learning; Support vector machine; Random forests; Classifier ensemble; SEQUENCE-BASED PREDICTION; BINDING RESIDUES PREDICTION; SOLVENT ACCESSIBILITY; IDENTIFICATION; CLASSIFIER; PROFILE; BLAST; AREA;
D O I
10.1016/j.neucom.2016.02.022
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Predicting protein-protein interaction (PPI) sites from protein sequences is still a challenge task in computational biology. There exists a severe class imbalance phenomenon in predicting PPI sites, which leads to a decrease in overall performance for traditional statistical machine-learning-based classifiers, such as SVM and random forests. In this study, an ensemble of SVM and sample-weighted random forests (SSWRF) was proposed to deal with class imbalance. An SVM classifier was trained and applied to estimate the weights of training samples. Then, the training samples with estimated weights were utilized to train a sample-weighted random forests (SWRF). In addition, a lower-dimensional feature representation method, which consists of evolutionary conservation, hydrophobic property, solvent accessibility features derived from a target residue and its neighbors, was developed to improve the discriminative capability for PPI sites prediction. The analysis of feature importance shows that the proposed feature representation method is an effective representation for predicting PPI sites. The proposed SSWRF achieved 22.4% and 35.1% in MCC and F-measure, respectively, on independent validation dataset Dtestset72, and achieved 15.2% and 36.5% in MCC and F-measure, respectively, on PDBtestset164. Computational comparisons between existing PPI sites predictors on benchmark datasets demonstrated that the proposed SSWRF is effective for PPI sites prediction and outperforms the state-of-the-art sequence-based method (i.e., LORIS) released most recently. The benchmark datasets used in this study and the source codes of the proposed method are publicly available at http://csbio.njust.edu.cn/ bioinf/SSWRF for academic use. (C) 2016 Elsevier B.V. All rights reserved.
引用
收藏
页码:201 / 212
页数:12
相关论文
共 50 条
  • [41] Ensembling of Gene Clusters Utilizing Deep Learning and Protein-Protein Interaction Information
    Dutta, Pratik
    Saha, Sriparna
    Chopra, Saraansh
    Miglani, Varnika
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2020, 17 (06) : 2005 - 2016
  • [42] Human protein-protein interaction prediction
    Mark D McDowall
    Michelle S Scott
    Geoffrey J Barton
    BMC Bioinformatics, 11 (Suppl 10)
  • [43] Improving protein-protein interaction prediction by using encoding strategies and random indices
    Al-Daoud, Essam
    World Academy of Science, Engineering and Technology, 2011, 51 : 265 - 269
  • [44] Random forest similarity for protein-protein interaction prediction from multiple sources
    Qi, YJ
    Klein-Seetharaman, J
    Bar-Joseph, Z
    PACIFIC SYMPOSIUM ON BIOCOMPUTING 2005, 2005, : 531 - 542
  • [45] Protein-protein interaction site prediction using random forest proximity distance
    Qiu, Zhijun
    Liu, Qingjie
    JOURNAL OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 2021, 19 (01)
  • [46] Localization and characterization of protein-protein interaction sites
    Singh, Chingakharn Ranjit
    Asano, Katsura
    TRANSLATION INITIATION: EXTRACT SYSTEMS AND MOLECULAR GENETICS, 2007, 429 : 139 - 161
  • [47] Protein-Protein Interaction Sites of the Troponin Complex
    Evans, J. S.
    Levine, B. A.
    BIOCHEMICAL SOCIETY TRANSACTIONS, 1979, 7 : 701 - 702
  • [48] PPIs Meta: A Meta-predictor of Protein-Protein Interaction Sites with Weighted Voting Strategy
    Zhao, Xiaowei
    Bao, Lingling
    Zhao, Xiaosa
    Yin, Minghao
    CURRENT PROTEOMICS, 2017, 14 (03) : 186 - 193
  • [49] Protein-Protein Interaction Prediction for Targeted Protein Degradation
    Orasch, Oliver
    Weber, Noah
    Mueller, Michael
    Amanzadi, Amir
    Gasbarri, Chiara
    Trummer, Christopher
    INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2022, 23 (13)
  • [50] Parallel prediction of protein-protein interactions using proximal SVM
    Chung, YJ
    Cho, SY
    Shin, SY
    ROUGH SETS, FUZZY SETS, DATA MINING, AND GRANULAR COMPUTING, PT 2, PROCEEDINGS, 2005, 3642 : 430 - 437