Multiclass Synthetic Accessibility Prediction

被引:0
|
作者
Li, Xinqi [1 ]
Walsh, Ryan [2 ,3 ]
Abbas, Waseem [1 ]
Pascual-Diaz, Sergio [1 ]
Hand, Calum [1 ]
Garland, Rory [1 ]
Khan, Faiz Mohammad [1 ]
Das, Nikhil Mohan [1 ]
Desai, Vedant [1 ]
Abouzleikha, Mohamed [1 ]
Clark, Matthew A. [3 ]
机构
[1] X Chem UK, Altrincham WA14 2DT, Cheshire, England
[2] X Chem Canada, Montreal, PQ H4R 2P1, Canada
[3] X Chem Global HQ, Waltham, MA 02453 USA
关键词
D O I
10.1021/acs.jcim.4c01663
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
Evaluating synthetic accessibility of in silico molecules is an integral component of the drug discovery process. While the application of machine learning models to predict whether small molecules are easy or hard to synthesize has gained attention recently, predetermined thresholds and data set imbalances present challenges for these binary classification approaches. In this study, we introduce a novel multiclass fold-ensembled classification approach to predict the minimum number of steps needed to synthesize a small molecule. By ensembling the base models trained on multiple stratified subsampled folds, this approach effectively mitigates the impact of class imbalance through probability aggregation or voting aggregation strategies. Additionally, we propose fuzzy evaluation metrics that account for practical tolerances in predictions, providing a more flexible and realistic assessment of model performance. Through experimentation on two reaction benchmark data sets, we demonstrate the effectiveness of our model in a multiclass synthetic accessibility prediction task and the superiority of our proposed method over six existing models in binary synthetic accessibility prediction tasks.
引用
收藏
页码:1155 / 1165
页数:11
相关论文
共 50 条
  • [1] Prediction of Synthetic Accessibility Based on Commercially Available Compound Databases
    Fukunishi, Yoshifumi
    Kurosawa, Takashi
    Mikami, Yoshiaki
    Nakamura, Haruki
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2014, 54 (12) : 3259 - 3267
  • [2] Organic Compound Synthetic Accessibility Prediction Based on the Graph Attention Mechanism
    Yu, Jiahui
    Wang, Jike
    Zhao, Hong
    Gao, Junbo
    Kang, Yu
    Cao, Dongsheng
    Wang, Zhe
    Hou, Tingjun
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2022, 62 (12) : 2973 - 2986
  • [3] ACCESSIBILITY AND REACTIVITY OF SYNTHETIC CELLULOSES
    SCHLEICHER, H
    KUNZE, J
    PHILIPP, B
    PAPIER, 1987, 41 (12): : 645 - 651
  • [4] Diketomorpholines: Synthetic Accessibility and Utilization
    Lan Phuong Vu
    Gutschow, Michael
    ACS OMEGA, 2022, 7 (01): : 48 - 54
  • [5] Lead optimization with synthetic accessibility
    Lin, Fang-Yu
    Tseng, Yufeng J.
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2011, 241
  • [6] Logarithmic Time Online Multiclass prediction
    Choromanska, Anna
    Langford, John
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28
  • [7] A Hybrid Loss for Multiclass and Structured Prediction
    Shi, Qinfeng
    Reid, Mark
    Caetano, Tiberio
    Van den Hengel, Anton
    Wang, Zhenhua
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2015, 37 (01) : 2 - 12
  • [8] Synthetic minority oversampling technique for multiclass imbalance problems
    Zhu, Tuanfei
    Lin, Yaping
    Liu, Yonghe
    PATTERN RECOGNITION, 2017, 72 : 327 - 340
  • [9] A Unified Framework for Bandit Online Multiclass Prediction
    Feng, Wanjin
    Gao, Xingyu
    Zhao, Peilin
    Hoi, Steven C. H.
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2025, 37 (05) : 2200 - 2211
  • [10] Short proofs for online multiclass prediction on graphs
    Fakcharoenphol, Jittat
    Kijsirikul, Boonserm
    INFORMATION PROCESSING LETTERS, 2010, 110 (8-9) : 309 - 311