Multilingual Molecular Representation Learning via Contrastive Pre-training

被引:0
|
作者
Guo, Zhihui [1 ]
Sharma, Pramod [1 ]
Martinez, Andy [1 ]
Du, Liang [1 ]
Abraham, Robin [1 ]
机构
[1] Microsoft Corp, Redmond, WA 98052 USA
关键词
DESCRIPTORS; SIMILARITY;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Molecular representation learning plays an essential role in cheminformatics. Recently, language model-based approaches have gained popularity as an alternative to traditional expert-designed features to encode molecules. However, these approaches only utilize a single molecular language for representation learning. Motivated by the fact that a given molecule can be described using different languages such as Simplified Molecular Line Entry System (SMILES), the International Union of Pure and Applied Chemistry (IUPAC), and the IUPAC International Chemical Identifier (InChI), we propose a multilingual molecular embedding generation approach called MM-Deacon (multilingual molecular domain embedding analysis via contrastive learning). MM-Deacon is pre-trained using SMILES and IUPAC as two different languages on large-scale molecules. We evaluated the robustness of our method on seven molecular property prediction tasks from MoleculeNet benchmark, zero-shot cross-lingual retrieval, and a drug-drug interaction prediction task.
引用
收藏
页码:3441 / 3453
页数:13
相关论文
共 50 条
  • [1] VarCLR: Variable Semantic Representation Pre-training via Contrastive Learning
    Chen, Qibin
    Lacomis, Jeremy
    Schwartz, Edward J.
    Neubig, Graham
    Vasilescu, Bogdan
    Le Goues, Claire
    2022 ACM/IEEE 44TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2022), 2022, : 2327 - 2339
  • [2] Discovering Representation Sprachbund For Multilingual Pre-Training
    Fan, Yimin
    Liang, Yaobo
    Muzio, Alexandre
    Hassan, Hany
    Li, Houqiang
    Zhou, Ming
    Duan, Nan
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 881 - 894
  • [3] Multilingual Pre-training Model-Assisted Contrastive Learning Neural Machine Translation
    Sun, Shuo
    Hou, Hong-xu
    Yang, Zong-heng
    Wang, Yi-song
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [4] Contrastive Pre-training with Adversarial Perturbations for Check-in Sequence Representation Learning
    Gong, Letian
    Lin, Youfang
    Guo, Shengnan
    Lin, Yan
    Wang, Tianyi
    Zheng, Erwen
    Zhou, Zeyu
    Wan, Huaiyu
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 4, 2023, : 4276 - 4283
  • [5] A Multi-view Molecular Pre-training with Generative Contrastive Learning
    Liu, Yunwu
    Zhang, Ruisheng
    Yuan, Yongna
    Ma, Jun
    Li, Tongfeng
    Yu, Zhixuan
    INTERDISCIPLINARY SCIENCES-COMPUTATIONAL LIFE SCIENCES, 2024, 16 (03) : 741 - 754
  • [6] Multilingual Pre-training with Universal Dependency Learning
    Sun, Kailai
    Li, Zuchao
    Zhao, Hai
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [7] Robust Pre-Training by Adversarial Contrastive Learning
    Jiang, Ziyu
    Chen, Tianlong
    Chen, Ting
    Wang, Zhangyang
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [8] MoleMCL: a multi-level contrastive learning framework for molecular pre-training
    Zhang, Xinyi
    Xu, Yanni
    Jiang, Changzhi
    Shen, Lian
    Liu, Xiangrong
    BIOINFORMATICS, 2024, 40 (04)
  • [9] Learning Transferable User Representations with Sequential Behaviors via Contrastive Pre-training
    Cheng, Mingyue
    Yuan, Fajie
    Liu, Qi
    Xin, Xin
    Chen, Enhong
    2021 21ST IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2021), 2021, : 51 - 60
  • [10] Image Difference Captioning with Pre-training and Contrastive Learning
    Yao, Linli
    Wang, Weiying
    Jin, Qin
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 3108 - 3116