Multi-modal molecule structure-text model for text-based retrieval and editing

被引：26

作者：

Liu, Shengchao ^{[1
,2
]}

Nie, Weili ^{[3
]}

Wang, Chengpeng ^{[4
]}

Lu, Jiarui ^{[1
,2
]}

Qiao, Zhuoran ^{[5
]}

Liu, Ling ^{[6
]}

Tang, Jian ^{[1
,7
]}

Xiao, Chaowei ^{[3
,8
]}

Anandkumar, Animashree ^{[3
,5
]}

机构：

[1] Mila Quebec Artificial Intelligence Inst, Montreal, PQ, Canada

[2] Univ Montreal, Montreal, PQ, Canada

[3] NVIDIA Res, Santa Clara, CA 95051, Albania

[4] Univ Illinois, Champaign, IL USA

[5] CALTECH, Pasadena, CA 91125 USA

[6] Princeton Univ, Princeton, NJ USA

[7] HEC Montreal, Montreal, PQ, Canada

[8] Arizona State Univ, Tempe, AZ USA

来源：

NATURE MACHINE INTELLIGENCE | 2023年 / 5卷 / 12期

关键词：

DRUG; SIMILARITY; DISCOVERY; CHEMISTRY; AREA; ZINC;

D O I：

10.1038/s42256-023-00759-6

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

There is increasing adoption of artificial intelligence in drug discovery. However, existing studies use machine learning to mainly utilize the chemical structures of molecules but ignore the vast textual knowledge available in chemistry. Incorporating textual knowledge enables us to realize new drug design objectives, adapt to text-based instructions and predict complex biological activities. Here we present a multi-modal molecule structure-text model, MoleculeSTM, by jointly learning molecules' chemical structures and textual descriptions via a contrastive learning strategy. To train MoleculeSTM, we construct a large multi-modal dataset, namely, PubChemSTM, with over 280,000 chemical structure-text pairs. To demonstrate the effectiveness and utility of MoleculeSTM, we design two challenging zero-shot tasks based on text instructions, including structure-text retrieval and molecule editing. MoleculeSTM has two main properties: open vocabulary and compositionality via natural language. In experiments, MoleculeSTM obtains the state-of-the-art generalization ability to novel biochemical concepts across various benchmarks. Machine learning methods in cheminformatics have made great progress in using chemical structures of molecules, but a large portion of textual information remains scarcely explored. Liu and colleagues trained MoleculeSTM, a foundation model that aligns the structure and text modalities through contrastive learning, and show its utility on the downstream tasks of structure-text retrieval, text-guided editing and molecular property prediction.

引用

页码：1447 / 1457

页数：11

共 50 条

[41] External query reformulation for text-based image retrieval
Min, Jinming
Jones, Gareth J. F.
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2011, 7024 LNCS : 249 - 260
[42] External Query Reformulation for Text-Based Image Retrieval
Min, Jinming
Jones, Gareth J. F.
STRING PROCESSING AND INFORMATION RETRIEVAL, 2011, 7024 : 249 - 260
[43] Implementation and Comparison of Text-Based Image Retrieval Schemes
Zaidi, Syed Ali Jafar
Buriro, Attaullah
Riaz, Mohammad
Mahoob, Athar
Riaz, Mohammad Noman
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2019, 10 (01) : 611 - 618
[44] Imagic: Text-Based Real Image Editing with Diffusion Models
Kawar, Bahjat
Zada, Shiran
Lang, Oran
Tov, Omer
Chang, Huiwen
Dekel, Tali
Mosseri, Inbar
Irani, Michal
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 6007 - 6017
[45] DATENeRF: Depth-Aware Text-Based Editing of NeRFs
Rojas, Sara
Philip, Julien
Zhang, Kai
Bi, Sai
Luan, Fujun
Ghanem, Bernard
Sunkavalli, Kalyan
COMPUTER VISION - ECCV 2024, PT XI, 2025, 15069 : 267 - 284
[46] Multi-scale Multi-modal Dictionary BERT For Effective Text-image Retrieval in Multimedia Advertising
Yu, Tan
Liu, Jie
Jin, Zhipeng
Yang, Yi
Fei, Hongliang
Li, Ping
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 4655 - 4660
[47] Sentiment Classification Algorithm Based on Multi-Modal Social Media Text Information
Xuanyuan, Minzheng
Xiao, Le
Duan, Mengshi
IEEE ACCESS, 2021, 9 : 33410 - 33418
[48] Bridging the gap: multi-granularity representation learning for text-based vehicle retrieval
Bo, Xue
Liu, Junjie
Yang, Di
Ma, Wentao
COMPLEX & INTELLIGENT SYSTEMS, 2025, 11 (01)
[49] Cross-modal domain adaptation for text-based regularization of image semantics in image retrieval systems
Pereira, Jose Costa
Vasconcelos, Nuno
COMPUTER VISION AND IMAGE UNDERSTANDING, 2014, 124 : 123 - 135
[50] Multi-modal graph reasoning for structured video text extraction
Shi, Weitao
Wang, Han
Lou, Xin
COMPUTERS & ELECTRICAL ENGINEERING, 2023, 107

← 1 2 3 4 5 →