EnzymeMap: curation, validation and data-driven prediction of enzymatic reactions

被引:7
|
作者
Heid, Esther [1 ,2 ]
Probst, Daniel [3 ]
Green, William H. [2 ]
Madsen, Georg K. H. [1 ]
机构
[1] TU Wien, Inst Mat Chem, A-1060 Vienna, Austria
[2] MIT, Dept Chem Engn, Cambridge, MA 02139 USA
[3] IBM Res Europe, CH-8803 Ruschlikon, Switzerland
基金
奥地利科学基金会;
关键词
BIOCATALYSIS; CASCADE; RETROSYNTHESIS; RESOURCE; OUTCOMES; DESIGN; TOOL;
D O I
10.1039/d3sc02048g
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Enzymatic reactions are an ecofriendly, selective, and versatile addition, sometimes even alternative to organic reactions for the synthesis of chemical compounds such as pharmaceuticals or fine chemicals. To identify suitable reactions, computational models to predict the activity of enzymes on non-native substrates, to perform retrosynthetic pathway searches, or to predict the outcomes of reactions including regio- and stereoselectivity are becoming increasingly important. However, current approaches are substantially hindered by the limited amount of available data, especially if balanced and atom mapped reactions are needed and if the models feature machine learning components. We therefore constructed a high-quality dataset (EnzymeMap) by developing a large set of correction and validation algorithms for recorded reactions in the literature and showcase its significant positive impact on machine learning models of retrosynthesis, forward prediction, and regioselectivity prediction, outperforming previous approaches by a large margin. Our dataset allows for deep learning models of enzymatic reactions with unprecedented accuracy, and is freely available online. New curation and atom-mapping routine leading to large database of enzymatic reactions boosts performance of deep learning models.
引用
收藏
页码:14229 / 14242
页数:14
相关论文
共 50 条
  • [41] A Data-Driven Model for Rapid CII Prediction
    Muehmer, Markus
    La Ferlita, Alessandro
    Geber, Evangelos
    Ehlers, Soeren
    Di Nardo, Emanuel
    El Moctar, Ould
    Ciaramella, Angelo
    JOURNAL OF MARINE SCIENCE AND ENGINEERING, 2024, 12 (11)
  • [42] Battery Safety: Data-Driven Prediction of Failure
    Finegan, Donal P.
    Cooper, Samuel J.
    JOULE, 2019, 3 (11) : 2599 - 2601
  • [43] A data-driven approach to RUL prediction of tools
    Li, Wei
    Zhang, Liang-Chi
    Wu, Chu-Han
    Wang, Yan
    Cui, Zhen-Xiang
    Niu, Chao
    ADVANCES IN MANUFACTURING, 2024, 12 (01) : 6 - 18
  • [44] Data-driven stuck pipe prediction and remedies
    Al Dushaishi, Mohammed F.
    Abbas, Ahmed K.
    Alsaba, Mortadha
    Abbas, Hayder
    Dawood, Jawad
    UPSTREAM OIL AND GAS TECHNOLOGY, 2021, 6
  • [45] A General Data-Driven Algorithm for Lifetime Prediction
    Li, Qiao
    Lu, Ningyun
    Jiang, Bin
    2013 25TH CHINESE CONTROL AND DECISION CONFERENCE (CCDC), 2013, : 2949 - 2954
  • [46] Global data-driven prediction of fire activity
    Francesca Di Giuseppe
    Joe McNorton
    Anna Lombardi
    Fredrik Wetterhall
    Nature Communications, 16 (1)
  • [47] Data-driven distributed photovoltaic power prediction
    Xie, Zhigang
    Cui, Jinzhu
    Ma, Qiubo
    Xia, Chengbi
    Tang, Xin
    INTERNATIONAL JOURNAL OF LOW-CARBON TECHNOLOGIES, 2025, 20 : 702 - 710
  • [48] Data-Driven Free-Fall Prediction
    不详
    IEEE CONTROL SYSTEMS MAGAZINE, 2023, 43 (05): : 31 - 31
  • [49] Data-Driven Dose Volume Histogram Prediction
    Polizzi, M.
    Watkins, R.
    Watkins, W.
    MEDICAL PHYSICS, 2020, 47 (06) : E734 - E735
  • [50] Data-driven Quality Related Prediction and Monitoring
    Yin, Shen
    Wei, Zuolong
    Gao, Huijun
    Peng, Kaixiang
    38TH ANNUAL CONFERENCE ON IEEE INDUSTRIAL ELECTRONICS SOCIETY (IECON 2012), 2012, : 3874 - 3879