Learning continuous and data-driven molecular descriptors by translating equivalent chemical representations

被引:315
|
作者
Winter, Robin [1 ,2 ]
Montanari, Floriane [1 ]
Noe, Frank [2 ]
Clevert, Djork-Arne [1 ]
机构
[1] Bayer AG, Dept Bioinformat, Berlin, Germany
[2] Free Univ Berlin, Dept Math & Comp Sci, Berlin, Germany
关键词
SETS; CLASSIFICATION; TOXICITY;
D O I
10.1039/c8sc04175j
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
There has been a recent surge of interest in using machine learning across chemical space in order to predict properties of molecules or design molecules and materials with the desired properties. Most of this work relies on defining clever feature representations, in which the chemical graph structure is encoded in a uniform way such that predictions across chemical space can be made. In this work, we propose to exploit the powerful ability of deep neural networks to learn a feature representation from low-level encodings of a huge corpus of chemical structures. Our model borrows ideas from neural machine translation: it translates between two semantically equivalent but syntactically different representations of molecular structures, compressing the meaningful information both representations have in common in a low-dimensional representation vector. Once the model is trained, this representation can be extracted for any new molecule and utilized as a descriptor. In fair benchmarks with respect to various human-engineered molecular fingerprints and graph-convolution models, our method shows competitive performance in modelling quantitative structure-activity relationships in all analysed datasets. Additionally, we show that our descriptor significantly outperforms all baseline molecular fingerprints in two ligand-based virtual screening tasks. Overall, our descriptors show the most consistent performances in all experiments. The continuity of the descriptor space and the existence of the decoder that permits deducing a chemical structure from an embedding vector allow for exploration of the space and open up new opportunities for compound optimization and idea generation.
引用
收藏
页码:1692 / 1701
页数:10
相关论文
共 50 条
  • [31] Data-Driven ESP Vocabulary Learning
    Liu, Ping
    2016 2ND INTERNATIONAL CONFERENCE ON MODERN EDUCATION AND SOCIAL SCIENCE (MESS 2016), 2016, : 219 - 225
  • [32] Data-driven approach for ontology learning
    Ocampo-Guzman, Isidra
    Lopez-Arevalo, Ivan
    Sosa-Sosa, Victor
    2009 6TH INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING, COMPUTING SCIENCE AND AUTOMATION CONTROL (CCE 2009), 2009, : 463 - 468
  • [33] DATA-DRIVEN LEARNING OF NONAUTONOMOUS SYSTEMS
    Qin, Tong
    Chen, Zhen
    Jakeman, John D.
    Xiu, Dongbin
    SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2021, 43 (03): : A1607 - A1624
  • [34] A Review on Data-Driven Learning Approaches for Fault Detection and Diagnosis in Chemical Processes
    Taqvi, Syed Ali Ammar
    Zabiri, Haslinda
    Tufa, Lemma Dendena
    Uddin, Fahim
    Fatima, Syeda Anmol
    Maulud, Abdulhalim Shah
    CHEMBIOENG REVIEWS, 2021, 8 (03) : 239 - 259
  • [35] Data-driven prediction of the equivalent sand-grain roughness
    Ma, Haoran
    Li, Yuhao
    Yang, Xin
    Ye, Lili
    SCIENTIFIC REPORTS, 2023, 13 (01)
  • [36] Continuous action iterated dilemma with data-driven compensation network and limited learning ability
    Qiu, Can
    Zhub, Yahui
    Cheong, Kang Hao
    Yu, Dengxiu
    Chen, C. L. Philip
    INFORMATION SCIENCES, 2023, 632 : 516 - 528
  • [37] Data-driven prediction of the equivalent sand-grain roughness
    Haoran Ma
    Yuhao Li
    Xin Yang
    Lili Ye
    Scientific Reports, 13
  • [38] Understanding the phytotoxic effects of organic contaminants on rice through predictive modeling with molecular descriptors: A data-driven analysis
    Wang, Shuyuan
    Chen, Jie
    Zhu, Lizhong
    JOURNAL OF HAZARDOUS MATERIALS, 2024, 476
  • [39] Data-driven Chemical Reaction Prediction and Retrosynthesis
    Nair, Vishnu H.
    Schwaller, Philippe
    Laino, Teodoro
    CHIMIA, 2019, 73 (12) : 997 - 1000
  • [40] Data-Driven Backstepping Control of Chemical Process
    Gao, Jiawen
    Huang, Jingwen
    PROCEEDINGS OF 2020 IEEE 9TH DATA DRIVEN CONTROL AND LEARNING SYSTEMS CONFERENCE (DDCLS'20), 2020, : 817 - 821