Large-scale comparison of machine learning methods for profiling prediction of kinase inhibitors

被引:8
|
作者
Wu, Jiangxia [1 ]
Chen, Yihao [1 ]
Wu, Jingxing [1 ]
Zhao, Duancheng [1 ]
Huang, Jindi [1 ]
Lin, Mujie [1 ]
Wang, Ling [1 ]
机构
[1] South China Univ Technol, Guangdong Prov Key Lab Fermentat & Enzyme Engn, Guangdong Prov Engn & Technol Res Ctr Biopharmaceu, Sch Biol & Biol Engn,Joint Int Res Lab Synthet Bio, Guangzhou 510006, Peoples R China
基金
中国国家自然科学基金;
关键词
Kinase profiling; Machine learning; Deep learning; Molecular fingerprints; Molecular graphs; PROTEIN-KINASE; RANDOM FOREST; QSAR MODELS; DISCOVERY; SELECTIVITY; CLASSIFICATION; FAMILY; AGENTS; PAIRS;
D O I
10.1186/s13321-023-00799-5
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Conventional machine learning (ML) and deep learning (DL) play a key role in the selectivity prediction of kinase inhibitors. A number of models based on available datasets can be used to predict the kinase profile of compounds, but there is still controversy about the advantages and disadvantages of ML and DL for such tasks. In this study, we constructed a comprehensive benchmark dataset of kinase inhibitors, involving in 141,086 unique compounds and 216,823 well-defined bioassay data points for 354 kinases. We then systematically compared the performance of 12 ML and DL methods on the kinase profiling prediction task. Extensive experimental results reveal that (1) Descriptor-based ML models generally slightly outperform fingerprint-based ML models in terms of predictive performance. RF as an ensemble learning approach displays the overall best predictive performance. (2) Single-task graph-based DL models are generally inferior to conventional descriptor- and fingerprint-based ML models, however, the corresponding multi-task models generally improves the average accuracy of kinase profile prediction. For example, the multi-task FP-GNN model outperforms the conventional descriptor- and fingerprint-based ML models with an average AUC of 0.807. (3) Fusion models based on voting and stacking methods can further improve the performance of the kinase profiling prediction task, specifically, RF::AtomPairs + FP2 + RDKitDes fusion model performs best with the highest average AUC value of 0.825 on the test sets. These findings provide useful information for guiding choices of the ML and DL methods for the kinase profiling prediction tasks. Finally, an online platform called KIPP (https://kipp.idruglab.cn) and python software are developed based on the best models to support the kinase profiling prediction, as well as various kinase inhibitor identification tasks including virtual screening, compound repositioning and target fishing.
引用
收藏
页数:22
相关论文
共 50 条
  • [1] Large-scale comparison of machine learning methods for profiling prediction of kinase inhibitors
    Jiangxia Wu
    Yihao Chen
    Jingxing Wu
    Duancheng Zhao
    Jindi Huang
    MuJie Lin
    Ling Wang
    Journal of Cheminformatics, 16
  • [2] Large-scale comparison of machine learning methods for drug target prediction on ChEMBL
    Mayr, Andreas
    Klambauer, Guenter
    Unterthiner, Thomas
    Steijaert, Marvin
    Wegner, Jorg K.
    Ceulemans, Hugo
    Clevert, Djork-Arne
    Hochreiter, Sepp
    CHEMICAL SCIENCE, 2018, 9 (24) : 5441 - 5451
  • [3] Optimization Methods for Large-Scale Machine Learning
    Bottou, Leon
    Curtis, Frank E.
    Nocedal, Jorge
    SIAM REVIEW, 2018, 60 (02) : 223 - 311
  • [4] Large-scale comparison of machine learning algorithms for target prediction of natural products
    Liang, Lu
    Liu, Ye
    Kang, Bo
    Wang, Ru
    Sun, Meng-Yu
    Wu, Qi
    Meng, Xiang-Fei
    Lin, Jian-Ping
    BRIEFINGS IN BIOINFORMATICS, 2022, 23 (05)
  • [5] Large-Scale Machine Learning for Business Sector Prediction
    Angenent, Mitch N.
    Barata, Antonio Pereira
    Takes, Frank W.
    PROCEEDINGS OF THE 35TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING (SAC'20), 2020, : 1143 - 1146
  • [6] A review of Nystrom methods for large-scale machine learning
    Sun, Shiliang
    Zhao, Jing
    Zhu, Jiang
    INFORMATION FUSION, 2015, 26 : 36 - 48
  • [7] Evaluation of Machine Learning Methods on Large-Scale Spatiotemporal Data for Photovoltaic Power Prediction
    Sauter, Evan
    Mughal, Maqsood
    Zhang, Ziming
    ENERGIES, 2023, 16 (13)
  • [8] Validating the validation: reanalyzing a large-scale comparison of deep learning and machine learning models for bioactivity prediction
    Matthew C. Robinson
    Robert C. Glen
    Alpha A. Lee
    Journal of Computer-Aided Molecular Design, 2020, 34 : 717 - 730
  • [9] Validating the validation: reanalyzing a large-scale comparison of deep learning and machine learning models for bioactivity prediction
    Robinson, Matthew C.
    Glen, Robert C.
    Lee, Alpha A.
    JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN, 2020, 34 (07) : 717 - 730
  • [10] Conformal Prediction in Spark: Large-Scale Machine Learning with Confidence
    Capuccini, Marco
    Carlsson, Lars
    Norinder, Ulf
    Spjuth, Ola
    2015 IEEE/ACM 2ND INTERNATIONAL SYMPOSIUM ON BIG DATA COMPUTING (BDC), 2015, : 61 - 67