Protein-protein interaction sites prediction by ensembling SVM and sample-weighted random forests

被引:93
|
作者
Wei, Zhi-Sen [1 ]
Han, Ke [1 ]
Yang, Jing-Yu [1 ]
Shen, Hong-Bin [2 ]
Yu, Dong-Jun [1 ]
机构
[1] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Xiaolingwei 200, Nanjing 210094, Jiangsu, Peoples R China
[2] Shanghai Jiao Tong Univ, Inst Image Proc & Pattern Recognit, Dongchuan Rd 800, Shanghai 200240, Peoples R China
基金
中国博士后科学基金; 中国国家自然科学基金;
关键词
Protein-protein interaction sites; Sequence-based prediction; Imbalanced learning; Support vector machine; Random forests; Classifier ensemble; SEQUENCE-BASED PREDICTION; BINDING RESIDUES PREDICTION; SOLVENT ACCESSIBILITY; IDENTIFICATION; CLASSIFIER; PROFILE; BLAST; AREA;
D O I
10.1016/j.neucom.2016.02.022
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Predicting protein-protein interaction (PPI) sites from protein sequences is still a challenge task in computational biology. There exists a severe class imbalance phenomenon in predicting PPI sites, which leads to a decrease in overall performance for traditional statistical machine-learning-based classifiers, such as SVM and random forests. In this study, an ensemble of SVM and sample-weighted random forests (SSWRF) was proposed to deal with class imbalance. An SVM classifier was trained and applied to estimate the weights of training samples. Then, the training samples with estimated weights were utilized to train a sample-weighted random forests (SWRF). In addition, a lower-dimensional feature representation method, which consists of evolutionary conservation, hydrophobic property, solvent accessibility features derived from a target residue and its neighbors, was developed to improve the discriminative capability for PPI sites prediction. The analysis of feature importance shows that the proposed feature representation method is an effective representation for predicting PPI sites. The proposed SSWRF achieved 22.4% and 35.1% in MCC and F-measure, respectively, on independent validation dataset Dtestset72, and achieved 15.2% and 36.5% in MCC and F-measure, respectively, on PDBtestset164. Computational comparisons between existing PPI sites predictors on benchmark datasets demonstrated that the proposed SSWRF is effective for PPI sites prediction and outperforms the state-of-the-art sequence-based method (i.e., LORIS) released most recently. The benchmark datasets used in this study and the source codes of the proposed method are publicly available at http://csbio.njust.edu.cn/ bioinf/SSWRF for academic use. (C) 2016 Elsevier B.V. All rights reserved.
引用
收藏
页码:201 / 212
页数:12
相关论文
共 50 条
  • [1] Prediction of Protein-Protein Interaction Sites in Sequences and 3D Structures by Random Forests
    Sikic, Mile
    Tomic, Sanja
    Vlahovicek, Kristian
    PLOS COMPUTATIONAL BIOLOGY, 2009, 5 (01)
  • [2] Protein-protein interaction sites prediction by ensemble random forests with synthetic minority oversampling technique
    Wang, Xiaoying
    Yu, Bin
    Ma, Anjun
    Chen, Cheng
    Liu, Bingqiang
    Ma, Qin
    BIOINFORMATICS, 2019, 35 (14) : 2395 - 2402
  • [3] A Cascade Random Forests Algorithm for Predicting Protein-Protein Interaction Sites
    Wei, Zhi-Sen
    Yang, Jing-Yu
    Shen, Hong-Bin
    Yu, Dong-Jun
    IEEE TRANSACTIONS ON NANOBIOSCIENCE, 2015, 14 (07) : 746 - 760
  • [4] Prediction of Protein-Protein Interaction Sites by Random Forest Algorithm with mRMR and IFS
    Li, Bi-Qing
    Feng, Kai-Yan
    Chen, Lei
    Huang, Tao
    Cai, Yu-Dong
    PLOS ONE, 2012, 7 (08):
  • [5] Prediction of protein-protein interaction sites by means of ensemble learning and weighted feature descriptor
    Du, Xiuquan
    Sun, Shiwei
    Hu, Changlin
    Li, Xinrui
    Xia, Junfeng
    JOURNAL OF BIOLOGICAL RESEARCH-THESSALONIKI, 2016, 23
  • [6] Predicting Protein-Protein Interactions with Weighted PSSM Histogram and Random Forests
    Wei, Zhi-Sen
    Yang, Jing-Yu
    Yu, Dong-Jun
    INTELLIGENCE SCIENCE AND BIG DATA ENGINEERING: BIG DATA AND MACHINE LEARNING TECHNIQUES, ISCIDE 2015, PT II, 2015, 9243 : 326 - 335
  • [7] Protein-Protein Interaction Prediction Using Single Class SVM
    Lei, Hairong
    Kniss, Joe Michael
    SEVENTH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, PROCEEDINGS, 2008, : 883 - +
  • [8] Predicting Protein-Protein Interaction Sites by Rotation Forests with Evolutionary Information
    Hu, Xinying
    Jing, Anqi
    Du, Xiuquan
    INTELLIGENT COMPUTING IN BIOINFORMATICS, 2014, 8590 : 271 - 279
  • [9] Prediction of Protein-Protein Interaction Based on Weighted Feature Fusion
    Zhang, Chunhua
    Guo, Sijia
    Zhang, Jingbo
    Jin, Xizi
    Li, Yanwen
    Du, Ning
    Sun, Pingping
    Jiang, Baohua
    LETTERS IN ORGANIC CHEMISTRY, 2019, 16 (04) : 263 - 274
  • [10] Essential Protein Detection by Random Walk on Weighted Protein-Protein Interaction Networks
    Xu, Bin
    Guan, Jihong
    Wang, Yang
    Wang, Zewei
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2019, 16 (02) : 377 - 387