Learning Invariant Molecular Representation in Latent Discrete Space

被引:0
|
作者
Zhuang, Xiang [1 ,2 ,3 ]
Zhang, Qiang [1 ,2 ,3 ]
Ding, Keyan [2 ]
Bian, Yatao [4 ]
Wang, Xiao [5 ]
Lv, Jingsong [6 ]
Chen, Hongyang [6 ]
Chen, Huajun [1 ,2 ,3 ]
机构
[1] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou, Peoples R China
[2] ZJU Hangzhou Global Sci & Technol Innovat Ctr, Hangzhou, Peoples R China
[3] Zhejiang Univ Ant Grp Joint Lab Knowledge Graph, Hangzhou, Peoples R China
[4] Tencent AI Lab, Shenzhen, Peoples R China
[5] Beihang Univ, Sch Software, Beijing, Peoples R China
[6] Zhejiang Lab, Hangzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
DESIGN;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Molecular representation learning lays the foundation for drug discovery. However, existing methods suffer from poor out-of-distribution (OOD) generalization, particularly when data for training and testing originate from different environments. To address this issue, we propose a new framework for learning molecular representations that exhibit invariance and robustness against distribution shifts. Specifically, we propose a strategy called "first-encoding-then-separation" to identify invariant molecule features in the latent space, which deviates from conventional practices. Prior to the separation step, we introduce a residual vector quantization module that mitigates the over-fitting to training data distributions while preserving the expressivity of encoders. Furthermore, we design a task-agnostic self-supervised learning objective to encourage precise invariance identification, which enables our method widely applicable to a variety of tasks, such as regression and multi-label classification. Extensive experiments on 18 real-world molecular datasets demonstrate that our model achieves stronger generalization against state-of-the-art baselines in the presence of various distribution shifts. Our code is available at https://github.com/HICAI-ZJU/iMoLD.
引用
收藏
页数:18
相关论文
共 50 条
  • [41] Learning discriminative and invariant representation for fingerprint retrieval
    Song, Dehua
    Li, Ruilin
    Zhang, Fandong
    Feng, Jufu
    SCIENCE CHINA-INFORMATION SCIENCES, 2019, 62 (01)
  • [42] LEARNING A TEMPORALLY INVARIANT REPRESENTATION FOR VISUAL TRACKING
    Ma, Chao
    Yang, Xiaokang
    Zhang, Chongyang
    Yang, Ming-Hsuan
    2015 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2015, : 857 - 861
  • [43] Learning discriminative and invariant representation for fingerprint retrieval
    Dehua Song
    Ruilin Li
    Fandong Zhang
    Jufu Feng
    Science China Information Sciences, 2019, 62
  • [44] Fundamental Limits and Tradeoffs in Invariant Representation Learning
    Zhao, Han
    Dan, Chen
    Aragam, Bryon
    Jaakkola, Tommi S.
    Gordon, Geoffrey J.
    Ravikumar, Pradeep
    JOURNAL OF MACHINE LEARNING RESEARCH, 2022, 23
  • [45] Latent Semantic Representation Learning for Scene Classification
    Li, Xin
    Guo, Yuhong
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 32 (CYCLE 2), 2014, 32 : 532 - 540
  • [46] Adversarial Latent Representation Learning for Speech Enhancement
    Qiu, Yuanhang
    Wang, Ruili
    INTERSPEECH 2020, 2020, : 2662 - 2666
  • [47] Latent semantic factorization for multimedia representation learning
    Hong Zhang
    Yu Huang
    Xin Xu
    Ziqi Zhu
    Chunhua Deng
    Multimedia Tools and Applications, 2018, 77 : 3353 - 3368
  • [48] Latent representation learning in biology and translational medicine
    Kopf, Andreas
    Claassen, Manfred
    PATTERNS, 2021, 2 (03):
  • [49] SUPERVISED ENCODING FOR DISCRETE REPRESENTATION LEARNING
    Le, Cat P.
    Zhou, Yi
    Ding, Jie
    Tarokh, Vahid
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 3447 - 3451
  • [50] Latent semantic factorization for multimedia representation learning
    Zhang, Hong
    Huang, Yu
    Xu, Xin
    Zhu, Ziqi
    Deng, Chunhua
    MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (03) : 3353 - 3368