Learning Invariant Molecular Representation in Latent Discrete Space

被引：0

作者：

Zhuang, Xiang ^{[1
,2
,3
]}

Zhang, Qiang ^{[1
,2
,3
]}

Ding, Keyan ^{[2
]}

Bian, Yatao ^{[4
]}

Wang, Xiao ^{[5
]}

Lv, Jingsong ^{[6
]}

Chen, Hongyang ^{[6
]}

Chen, Huajun ^{[1
,2
,3
]}

机构：

[1] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou, Peoples R China

[2] ZJU Hangzhou Global Sci & Technol Innovat Ctr, Hangzhou, Peoples R China

[3] Zhejiang Univ Ant Grp Joint Lab Knowledge Graph, Hangzhou, Peoples R China

[4] Tencent AI Lab, Shenzhen, Peoples R China

[5] Beihang Univ, Sch Software, Beijing, Peoples R China

[6] Zhejiang Lab, Hangzhou, Peoples R China

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023) | 2023年

基金：

中国国家自然科学基金;

关键词：

DESIGN;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Molecular representation learning lays the foundation for drug discovery. However, existing methods suffer from poor out-of-distribution (OOD) generalization, particularly when data for training and testing originate from different environments. To address this issue, we propose a new framework for learning molecular representations that exhibit invariance and robustness against distribution shifts. Specifically, we propose a strategy called "first-encoding-then-separation" to identify invariant molecule features in the latent space, which deviates from conventional practices. Prior to the separation step, we introduce a residual vector quantization module that mitigates the over-fitting to training data distributions while preserving the expressivity of encoders. Furthermore, we design a task-agnostic self-supervised learning objective to encourage precise invariance identification, which enables our method widely applicable to a variety of tasks, such as regression and multi-label classification. Extensive experiments on 18 real-world molecular datasets demonstrate that our model achieves stronger generalization against state-of-the-art baselines in the presence of various distribution shifts. Our code is available at https://github.com/HICAI-ZJU/iMoLD.

引用

页数：18

共 50 条

[41] Learning discriminative and invariant representation for fingerprint retrieval
Song, Dehua
Li, Ruilin
Zhang, Fandong
Feng, Jufu
SCIENCE CHINA-INFORMATION SCIENCES, 2019, 62 (01)
[42] LEARNING A TEMPORALLY INVARIANT REPRESENTATION FOR VISUAL TRACKING
Ma, Chao
Yang, Xiaokang
Zhang, Chongyang
Yang, Ming-Hsuan
2015 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2015, : 857 - 861
[43] Learning discriminative and invariant representation for fingerprint retrieval
Dehua Song
Ruilin Li
Fandong Zhang
Jufu Feng
Science China Information Sciences, 2019, 62
[44] Fundamental Limits and Tradeoffs in Invariant Representation Learning
Zhao, Han
Dan, Chen
Aragam, Bryon
Jaakkola, Tommi S.
Gordon, Geoffrey J.
Ravikumar, Pradeep
JOURNAL OF MACHINE LEARNING RESEARCH, 2022, 23
[45] Latent Semantic Representation Learning for Scene Classification
Li, Xin
Guo, Yuhong
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 32 (CYCLE 2), 2014, 32 : 532 - 540
[46] Adversarial Latent Representation Learning for Speech Enhancement
Qiu, Yuanhang
Wang, Ruili
INTERSPEECH 2020, 2020, : 2662 - 2666
[47] Latent semantic factorization for multimedia representation learning
Hong Zhang
Yu Huang
Xin Xu
Ziqi Zhu
Chunhua Deng
Multimedia Tools and Applications, 2018, 77 : 3353 - 3368
[48] Latent representation learning in biology and translational medicine
Kopf, Andreas
Claassen, Manfred
PATTERNS, 2021, 2 (03):
[49] SUPERVISED ENCODING FOR DISCRETE REPRESENTATION LEARNING
Le, Cat P.
Zhou, Yi
Ding, Jie
Tarokh, Vahid
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 3447 - 3451
[50] Latent semantic factorization for multimedia representation learning
Zhang, Hong
Huang, Yu
Xu, Xin
Zhu, Ziqi
Deng, Chunhua
MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (03) : 3353 - 3368

← 1 2 3 4 5 →