Protein-protein interaction sites prediction by ensembling SVM and sample-weighted random forests

被引:93
|
作者
Wei, Zhi-Sen [1 ]
Han, Ke [1 ]
Yang, Jing-Yu [1 ]
Shen, Hong-Bin [2 ]
Yu, Dong-Jun [1 ]
机构
[1] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Xiaolingwei 200, Nanjing 210094, Jiangsu, Peoples R China
[2] Shanghai Jiao Tong Univ, Inst Image Proc & Pattern Recognit, Dongchuan Rd 800, Shanghai 200240, Peoples R China
基金
中国博士后科学基金; 中国国家自然科学基金;
关键词
Protein-protein interaction sites; Sequence-based prediction; Imbalanced learning; Support vector machine; Random forests; Classifier ensemble; SEQUENCE-BASED PREDICTION; BINDING RESIDUES PREDICTION; SOLVENT ACCESSIBILITY; IDENTIFICATION; CLASSIFIER; PROFILE; BLAST; AREA;
D O I
10.1016/j.neucom.2016.02.022
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Predicting protein-protein interaction (PPI) sites from protein sequences is still a challenge task in computational biology. There exists a severe class imbalance phenomenon in predicting PPI sites, which leads to a decrease in overall performance for traditional statistical machine-learning-based classifiers, such as SVM and random forests. In this study, an ensemble of SVM and sample-weighted random forests (SSWRF) was proposed to deal with class imbalance. An SVM classifier was trained and applied to estimate the weights of training samples. Then, the training samples with estimated weights were utilized to train a sample-weighted random forests (SWRF). In addition, a lower-dimensional feature representation method, which consists of evolutionary conservation, hydrophobic property, solvent accessibility features derived from a target residue and its neighbors, was developed to improve the discriminative capability for PPI sites prediction. The analysis of feature importance shows that the proposed feature representation method is an effective representation for predicting PPI sites. The proposed SSWRF achieved 22.4% and 35.1% in MCC and F-measure, respectively, on independent validation dataset Dtestset72, and achieved 15.2% and 36.5% in MCC and F-measure, respectively, on PDBtestset164. Computational comparisons between existing PPI sites predictors on benchmark datasets demonstrated that the proposed SSWRF is effective for PPI sites prediction and outperforms the state-of-the-art sequence-based method (i.e., LORIS) released most recently. The benchmark datasets used in this study and the source codes of the proposed method are publicly available at http://csbio.njust.edu.cn/ bioinf/SSWRF for academic use. (C) 2016 Elsevier B.V. All rights reserved.
引用
收藏
页码:201 / 212
页数:12
相关论文
共 50 条
  • [11] Prediction of protein-protein interaction sites in heterocomplexes with neural networks
    Fariselli, P
    Pazos, F
    Valencia, A
    Casadio, R
    EUROPEAN JOURNAL OF BIOCHEMISTRY, 2002, 269 (05): : 1356 - 1361
  • [12] Prediction of protein-protein interaction sites using patch analysis
    Jones, S
    Thornton, JM
    JOURNAL OF MOLECULAR BIOLOGY, 1997, 272 (01) : 133 - 143
  • [13] Prediction of protein-protein interaction sites using an ensemble method
    Lei Deng
    Jihong Guan
    Qiwen Dong
    Shuigeng Zhou
    BMC Bioinformatics, 10
  • [14] Prediction of protein-protein interaction sites using an ensemble method
    Deng, Lei
    Guan, Jihong
    Dong, Qiwen
    Zhou, Shuigeng
    BMC BIOINFORMATICS, 2009, 10
  • [15] Prediction of protein-protein interaction sites in intrinsically disordered proteins
    Chen, Ranran
    Li, Xinlu
    Yang, Yaqing
    Song, Xixi
    Wang, Cheng
    Qiao, Dongdong
    FRONTIERS IN MOLECULAR BIOSCIENCES, 2022, 9
  • [16] Information of binding sites improves prediction of protein-protein interaction
    Patel, Tapan
    Pillay, Manoj
    Jawa, Rahul
    Liao, Li
    ICMLA 2006: 5TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, PROCEEDINGS, 2006, : 205 - +
  • [17] Prediction of protein-protein interacting sites by combining SVM algorithm with Bayesian method
    Wang, Bing
    Ge, Lu Sheng
    Huang, De-Shuang
    Wong, Hau San
    ICNC 2007: THIRD INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, VOL 2, PROCEEDINGS, 2007, : 329 - +
  • [18] Protein-protein interaction site prediction by model ensembling with hybrid feature and self-attention
    Cong, Hanhan
    Liu, Hong
    Cao, Yi
    Liang, Cheng
    Chen, Yuehui
    BMC BIOINFORMATICS, 2023, 24 (01)
  • [19] Protein-Protein Interaction Sites Prediction Based on an Under-Sampling Strategy and Random Forest Algorithm
    Li, Minjie
    Wu, Ziheng
    Wang, Wenyan
    Lu, Kun
    Zhang, Jun
    Zhou, Yuming
    Chen, Zhaoquan
    Li, Dan
    Zheng, Shicheng
    Chen, Peng
    Wang, Bing
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2022, 19 (06) : 3646 - 3654
  • [20] Prediction of Protein-Protein Interaction Sites Based on Naive Bayes Classifier
    Geng, Haijiang
    Lu, Tao
    Lin, Xiao
    Liu, Yu
    Yan, Fangrong
    BIOCHEMISTRY RESEARCH INTERNATIONAL, 2015, 2015