Multilingual Molecular Representation Learning via Contrastive Pre-training

被引:0
|
作者
Guo, Zhihui [1 ]
Sharma, Pramod [1 ]
Martinez, Andy [1 ]
Du, Liang [1 ]
Abraham, Robin [1 ]
机构
[1] Microsoft Corp, Redmond, WA 98052 USA
关键词
DESCRIPTORS; SIMILARITY;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Molecular representation learning plays an essential role in cheminformatics. Recently, language model-based approaches have gained popularity as an alternative to traditional expert-designed features to encode molecules. However, these approaches only utilize a single molecular language for representation learning. Motivated by the fact that a given molecule can be described using different languages such as Simplified Molecular Line Entry System (SMILES), the International Union of Pure and Applied Chemistry (IUPAC), and the IUPAC International Chemical Identifier (InChI), we propose a multilingual molecular embedding generation approach called MM-Deacon (multilingual molecular domain embedding analysis via contrastive learning). MM-Deacon is pre-trained using SMILES and IUPAC as two different languages on large-scale molecules. We evaluated the robustness of our method on seven molecular property prediction tasks from MoleculeNet benchmark, zero-shot cross-lingual retrieval, and a drug-drug interaction prediction task.
引用
收藏
页码:3441 / 3453
页数:13
相关论文
共 50 条
  • [41] EMMA- X: An EM-like Multilingual Pre-training Algorithm for Cross-lingual Representation Learning
    Guo, Ping
    Wei, Xiangpeng
    Hu, Yue
    Yang, Baosong
    Liu, Dayiheng
    Huang, Fei
    Xie, Jun
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [42] Non-Contrastive Learning Meets Language-Image Pre-Training
    Zhou, Jinghao
    Dong, Li
    Gan, Zhe
    Wang, Lijuan
    Wei, Furu
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 11028 - 11038
  • [43] Pre-training local and non-local geographical influences with contrastive learning
    Oh, Byungkook
    Suh, Ilhyun
    Cha, Kihoon
    Kim, Junbeom
    Park, Goeon
    Jeong, Sihyun
    KNOWLEDGE-BASED SYSTEMS, 2023, 259
  • [44] Contrastive Ground-Level Image and Remote Sensing Pre-training Improves Representation Learning for Natural World Imagery
    Huynh, Andy, V
    Gillespie, Lauren E.
    Lopez-Saucedo, Jael
    Tang, Claire
    Sikand, Rohan
    Exposito-Alonso, Moises
    COMPUTER VISION - ECCV 2024, PT LXXX, 2025, 15138 : 173 - 190
  • [45] Pre-training Universal Language Representation
    Li, Yian
    Zhao, Hai
    59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, 2021, : 5122 - 5133
  • [46] Multilingual Denoising Pre-training for Neural Machine Translation
    Liu, Yinhan
    Gu, Jiatao
    Goyal, Naman
    Li, Xian
    Edunov, Sergey
    Ghazvininejad, Marjan
    Lewis, Mike
    Zettlemoyer, Luke
    TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2020, 8 : 726 - 742
  • [47] Contrastive Pre-training and Representation Distillation for Medical Visual Question Answering Based on Radiology Images
    Liu, Bo
    Zhan, Li-Ming
    Wu, Xiao-Ming
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2021, PT II, 2021, 12902 : 210 - 220
  • [48] Supervised contrastive pre-training models for mammography screening
    Cao, Zhenjie
    Deng, Zhuo
    Yang, Zhicheng
    Ma, Jie
    Ma, Lan
    JOURNAL OF BIG DATA, 2025, 12 (01)
  • [49] Contrastive Language-knowledge Graph Pre-training
    Yuan, Xiaowei
    Liu, Kang
    Wang, Yequan
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2024, 23 (04)
  • [50] Multi-Modal Contrastive Pre-training for Recommendation
    Liu, Zhuang
    Ma, Yunpu
    Schubert, Matthias
    Ouyang, Yuanxin
    Xiong, Zhang
    PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2022, 2022, : 99 - 108