Multiclass Synthetic Accessibility Prediction

被引:0
|
作者
Li, Xinqi [1 ]
Walsh, Ryan [2 ,3 ]
Abbas, Waseem [1 ]
Pascual-Diaz, Sergio [1 ]
Hand, Calum [1 ]
Garland, Rory [1 ]
Khan, Faiz Mohammad [1 ]
Das, Nikhil Mohan [1 ]
Desai, Vedant [1 ]
Abouzleikha, Mohamed [1 ]
Clark, Matthew A. [3 ]
机构
[1] X Chem UK, Altrincham WA14 2DT, Cheshire, England
[2] X Chem Canada, Montreal, PQ H4R 2P1, Canada
[3] X Chem Global HQ, Waltham, MA 02453 USA
关键词
D O I
10.1021/acs.jcim.4c01663
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
Evaluating synthetic accessibility of in silico molecules is an integral component of the drug discovery process. While the application of machine learning models to predict whether small molecules are easy or hard to synthesize has gained attention recently, predetermined thresholds and data set imbalances present challenges for these binary classification approaches. In this study, we introduce a novel multiclass fold-ensembled classification approach to predict the minimum number of steps needed to synthesize a small molecule. By ensembling the base models trained on multiple stratified subsampled folds, this approach effectively mitigates the impact of class imbalance through probability aggregation or voting aggregation strategies. Additionally, we propose fuzzy evaluation metrics that account for practical tolerances in predictions, providing a more flexible and realistic assessment of model performance. Through experimentation on two reaction benchmark data sets, we demonstrate the effectiveness of our model in a multiclass synthetic accessibility prediction task and the superiority of our proposed method over six existing models in binary synthetic accessibility prediction tasks.
引用
收藏
页码:1155 / 1165
页数:11
相关论文
共 50 条
  • [41] Research of Synthetic Evaluation of Maintenance Accessibility for Complex Equipment
    Yang, Jian-jun
    He Zhuo-ting
    Yang, Chun-hui
    2009 INTERNATIONAL CONFERENCE ON INFORMATION MANAGEMENT, INNOVATION MANAGEMENT AND INDUSTRIAL ENGINEERING, VOL 4, PROCEEDINGS, 2009, : 345 - 351
  • [42] Effect of synthetic accessibility on the commercial viability of organic photovoltaics
    Osedach, Timothy P.
    Andrew, Trisha L.
    Bulovic, Vladimir
    ENERGY & ENVIRONMENTAL SCIENCE, 2013, 6 (03) : 711 - 718
  • [43] Estimation of Synthetic Accessibility during Computational Drug Design
    Vorsilak, Milan
    Svozil, Daniel
    CHEMICKE LISTY, 2017, 111 (11): : 760 - 765
  • [44] NEW DEVELOPMENTS IN THE CAESA SYSTEM FOR ESTIMATION OF SYNTHETIC ACCESSIBILITY
    MYATT, G
    JOHNSON, AP
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 1995, 210 : 7 - COMP
  • [45] Predicting synthetic accessibility: Application in drug discovery and development
    Baber, JC
    Feher, M
    MINI-REVIEWS IN MEDICINAL CHEMISTRY, 2004, 4 (06) : 681 - 692
  • [46] SYBA: Bayesian estimation of synthetic accessibility of organic compounds
    Milan Voršilák
    Michal Kolář
    Ivan Čmelo
    Daniel Svozil
    Journal of Cheminformatics, 12
  • [47] Ergonomic risk level prediction framework for multiclass imbalanced data
    Senjaya, Wenny Franciska
    Yahya, Bernardo Nugroho
    Lee, Seok-Lyong
    COMPUTERS & INDUSTRIAL ENGINEERING, 2023, 184
  • [48] Efficient Online Multiclass Prediction on Graphs via Surrogate Losses
    Rakhlin, Alexander
    Sridharan, Karthik
    ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 54, 2017, 54 : 1403 - 1411
  • [49] SYBA: Bayesian estimation of synthetic accessibility of organic compounds
    Vorsilak, Milan
    Kolar, Michal
    Cmelo, Ivan
    Svozil, Daniel
    JOURNAL OF CHEMINFORMATICS, 2020, 12 (01)
  • [50] Multiclass Sparse Bayesian Regression for fMRI-Based Prediction
    Michel, Vincent
    Eger, Evelyn
    Keribin, Christine
    Thirion, Bertrand
    INTERNATIONAL JOURNAL OF BIOMEDICAL IMAGING, 2011, 2011