Multilingual Molecular Representation Learning via Contrastive Pre-training

被引:0
|
作者
Guo, Zhihui [1 ]
Sharma, Pramod [1 ]
Martinez, Andy [1 ]
Du, Liang [1 ]
Abraham, Robin [1 ]
机构
[1] Microsoft Corp, Redmond, WA 98052 USA
关键词
DESCRIPTORS; SIMILARITY;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Molecular representation learning plays an essential role in cheminformatics. Recently, language model-based approaches have gained popularity as an alternative to traditional expert-designed features to encode molecules. However, these approaches only utilize a single molecular language for representation learning. Motivated by the fact that a given molecule can be described using different languages such as Simplified Molecular Line Entry System (SMILES), the International Union of Pure and Applied Chemistry (IUPAC), and the IUPAC International Chemical Identifier (InChI), we propose a multilingual molecular embedding generation approach called MM-Deacon (multilingual molecular domain embedding analysis via contrastive learning). MM-Deacon is pre-trained using SMILES and IUPAC as two different languages on large-scale molecules. We evaluated the robustness of our method on seven molecular property prediction tasks from MoleculeNet benchmark, zero-shot cross-lingual retrieval, and a drug-drug interaction prediction task.
引用
收藏
页码:3441 / 3453
页数:13
相关论文
共 50 条
  • [31] Contrastive Pre-Training of GNNs on Heterogeneous Graphs
    Jiang, Xunqiang
    Lu, Yuanfu
    Fang, Yuan
    Shi, Chuan
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, : 803 - 812
  • [32] Multilingual Translation from Denoising Pre-Training
    Tang, Yuqing
    Tran, Chau
    Li, Xian
    Chen, Peng-Jen
    Goyal, Naman
    Chaudhary, Vishrav
    Gu, Jiatao
    Fan, Angela
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 3450 - 3466
  • [33] Contrastive Code-Comment Pre-training
    Pei, Xiaohuan
    Liu, Daochang
    Qian, Luo
    Xu, Chang
    2022 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2022, : 398 - 407
  • [34] Contrastive Pre-training for Personalized Expert Finding
    Peng, Qiyao
    Liu, Hongtao
    Lv, Zhepeng
    Ng, Qingay
    Wang, Wenjun
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 15797 - 15806
  • [35] Temporal Contrastive Pre-Training for Sequential Recommendation
    Tian, Changxin
    Lin, Zihan
    Bian, Shuqing
    Wang, Jinpeng
    Zhao, Wayne Xin
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 1925 - 1934
  • [36] TRANSFORMER BASED UNSUPERVISED PRE-TRAINING FOR ACOUSTIC REPRESENTATION LEARNING
    Zhang, Ruixiong
    Wu, Haiwei
    Li, Wubo
    Jiang, Dongwei
    Zou, Wei
    Li, Xiangang
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6933 - 6937
  • [37] RePreM: Representation Pre-training with Masked Model for Reinforcement Learning
    Cai, Yuanying
    Zhang, Chuheng
    Shen, Wei
    Zhang, Xuyun
    Ruan, Wenjie
    Huang, Longbo
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 6, 2023, : 6879 - 6887
  • [38] ConCL: Concept Contrastive Learning for Dense Prediction Pre-training in Pathology Images
    Yang, Jiawei
    Chen, Hanbo
    Liang, Yuan
    Huang, Junzhou
    He, Lei
    Yao, Jianhua
    COMPUTER VISION, ECCV 2022, PT XXI, 2022, 13681 : 523 - 539
  • [39] Learning Visual Robotic Control Efficiently with Contrastive Pre-training and Data Augmentation
    Zhan, Albert
    Zhao, Ruihan
    Pinto, Lerrel
    Abbeel, Pieter
    Laskin, Michael
    2022 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2022, : 4040 - 4047
  • [40] Learning Visual Prior via Generative Pre-Training
    Xie, Jinheng
    Ye, Kai
    Li, Yudong
    Li, Yuexiang
    Lin, Kevin Qinghong
    Zheng, Yefeng
    Shen, Linlin
    Shou, Mike Zheng
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,