Protein-protein interaction sites prediction by ensembling SVM and sample-weighted random forests

被引:93
|
作者
Wei, Zhi-Sen [1 ]
Han, Ke [1 ]
Yang, Jing-Yu [1 ]
Shen, Hong-Bin [2 ]
Yu, Dong-Jun [1 ]
机构
[1] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Xiaolingwei 200, Nanjing 210094, Jiangsu, Peoples R China
[2] Shanghai Jiao Tong Univ, Inst Image Proc & Pattern Recognit, Dongchuan Rd 800, Shanghai 200240, Peoples R China
基金
中国博士后科学基金; 中国国家自然科学基金;
关键词
Protein-protein interaction sites; Sequence-based prediction; Imbalanced learning; Support vector machine; Random forests; Classifier ensemble; SEQUENCE-BASED PREDICTION; BINDING RESIDUES PREDICTION; SOLVENT ACCESSIBILITY; IDENTIFICATION; CLASSIFIER; PROFILE; BLAST; AREA;
D O I
10.1016/j.neucom.2016.02.022
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Predicting protein-protein interaction (PPI) sites from protein sequences is still a challenge task in computational biology. There exists a severe class imbalance phenomenon in predicting PPI sites, which leads to a decrease in overall performance for traditional statistical machine-learning-based classifiers, such as SVM and random forests. In this study, an ensemble of SVM and sample-weighted random forests (SSWRF) was proposed to deal with class imbalance. An SVM classifier was trained and applied to estimate the weights of training samples. Then, the training samples with estimated weights were utilized to train a sample-weighted random forests (SWRF). In addition, a lower-dimensional feature representation method, which consists of evolutionary conservation, hydrophobic property, solvent accessibility features derived from a target residue and its neighbors, was developed to improve the discriminative capability for PPI sites prediction. The analysis of feature importance shows that the proposed feature representation method is an effective representation for predicting PPI sites. The proposed SSWRF achieved 22.4% and 35.1% in MCC and F-measure, respectively, on independent validation dataset Dtestset72, and achieved 15.2% and 36.5% in MCC and F-measure, respectively, on PDBtestset164. Computational comparisons between existing PPI sites predictors on benchmark datasets demonstrated that the proposed SSWRF is effective for PPI sites prediction and outperforms the state-of-the-art sequence-based method (i.e., LORIS) released most recently. The benchmark datasets used in this study and the source codes of the proposed method are publicly available at http://csbio.njust.edu.cn/ bioinf/SSWRF for academic use. (C) 2016 Elsevier B.V. All rights reserved.
引用
收藏
页码:201 / 212
页数:12
相关论文
共 50 条
  • [31] Identification of Protein Interaction Partners and Protein-Protein Interaction Sites
    Sacquin-Mora, Sophie
    Carbone, Alessandra
    Lavery, Richard
    JOURNAL OF MOLECULAR BIOLOGY, 2008, 382 (05) : 1276 - 1289
  • [32] Prediction of Protein-Protein Interactions with Physicochemical Descriptors and Wavelet Transform via Random Forests
    Jia, Jianhua
    Xiao, Xuan
    Liu, Bingxiang
    JALA, 2016, 21 (03): : 368 - 377
  • [33] The Prediction of Protein-Protein Interaction Sites Based on RBF Classifier Improved by SMOTE
    Li, Hui
    Pi, Dechang
    Wang, Chishe
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2014, 2014
  • [34] Prediction of Protein-Protein Interaction Sites Combing Sequence Profile and Hydrophobic Information
    Peng, Lili
    Chen, Fang
    Zhou, Nian
    Chen, Peng
    Zhang, Jun
    Wang, Bing
    INTELLIGENT COMPUTING THEORIES AND APPLICATION, PT I, 2018, 10954 : 697 - 702
  • [35] Prediction of Protein-Protein Interaction Sites by Multifeature Fusion and RF with mRMR and IFS
    Zhang, JunYan
    Lyu, Yinghua
    Ma, Zhiqiang
    DISEASE MARKERS, 2022, 2022
  • [36] Prediction of Protein-Protein Interaction Sites Using Back Propagation Neural Networks
    Wang, Feilu
    Song, Yang
    2013 NINTH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION (ICNC), 2013, : 1057 - 1061
  • [37] A neural network method to improve prediction of protein-protein interaction sites in heterocomplexes
    Fariselli, P
    Zauli, A
    Rossi, I
    Finelli, M
    Martelli, PL
    Casadio, R
    2003 IEEE XIII WORKSHOP ON NEURAL NETWORKS FOR SIGNAL PROCESSING - NNSP'03, 2003, : 33 - 41
  • [38] A Transformer-Based Ensemble Framework for the Prediction of Protein-Protein Interaction Sites
    Mou, Minjie
    Pan, Ziqi
    Zhou, Zhimeng
    Zheng, Lingyan
    Zhang, Hanyu
    Shi, Shuiyang
    Li, Fengcheng
    Sun, Xiuna
    Zhu, Feng
    RESEARCH, 2023, 6
  • [39] SENSDeep: An Ensemble Deep Learning Method for Protein-Protein Interaction Sites Prediction
    Aybey, Engin
    Gumus, Ozgur
    INTERDISCIPLINARY SCIENCES-COMPUTATIONAL LIFE SCIENCES, 2023, 15 (01) : 55 - 87
  • [40] Fast prediction of protein-protein interaction sites based on Extreme Learning Machines
    Wang, Debby A.
    Wang, Ran
    Yan, Hong
    NEUROCOMPUTING, 2014, 128 : 258 - 266