A Systematic Survey of Chemical Pre-trained Models

被引：0

作者：

Xia, Jun ^{[1
]}

Zhu, Yanqiao ^{[2
]}

Du, Yuanqi ^{[3
]}

Li, Stan Z. ^{[1
]}

机构：

[1] Westlake Univ, Res Ctr Ind Future, Hangzhou, Peoples R China

[2] Univ Calif Los Angeles, Los Angeles, CA USA

[3] Cornell Univ, Ithaca, NY USA

来源：

PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023 | 2023年

基金：

国家重点研发计划; 中国国家自然科学基金;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Deep learning has achieved remarkable success in learning representations for molecules, which is crucial for various biochemical applications, ranging from property prediction to drug design. However, training Deep Neural Networks (DNNs) from scratch often requires abundant labeled molecules, which are expensive to acquire in the real world. To alleviate this issue, tremendous efforts have been devoted to Chemical Pre-trained Models (CPMs), where DNNs are pre-trained using large-scale unlabeled molecular databases and then fine-tuned over specific downstream tasks. Despite the prosperity, there lacks a systematic review of this fast-growing field. In this paper, we present the first survey that summarizes the current progress of CPMs. We first highlight the limitations of training molecular representation models from scratch to motivate CPM studies. Next, we systematically review recent advances on this topic from several key perspectives, including molecular descriptors, encoder architectures, pre-training strategies, and applications. We also highlight the challenges and promising avenues for future research, providing a useful resource for both machine learning and scientific communities.

引用

页码：6787 / 6795

页数：9

共 50 条

[21] Lottery Jackpots Exist in Pre-Trained Models
Zhang, Yuxin
Lin, Mingbao
Zhong, Yunshan
Chao, Fei
Ji, Rongrong
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (12) : 14990 - 15004
[22] LaoPLM: Pre-trained Language Models for Lao
Lin, Nankai
Fu, Yingwen
Yang, Ziyu
Chen, Chuwei
Jiang, Shengyi
LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6506 - 6512
[23] Generalization of vision pre-trained models for histopathology
Sikaroudi, Milad
Hosseini, Maryam
Gonzalez, Ricardo
Rahnamayan, Shahryar
Tizhoosh, H. R.
SCIENTIFIC REPORTS, 2023, 13 (01)
[24] PhoBERT: Pre-trained language models for Vietnamese
Dat Quoc Nguyen
Anh Tuan Nguyen
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 1037 - 1042
[25] Learning to Modulate pre-trained Models in RL
Schmied, Thomas
Hofmarcher, Markus
Paischer, Fabian
Pascanu, Razvan
Hochreiter, Sepp
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[26] Deciphering Stereotypes in Pre-Trained Language Models
Ma, Weicheng
Scheible, Henry
Wang, Brian
Veeramachaneni, Goutham
Chowdhary, Pratim
Sung, Alan
Koulogeorge, Andrew
Wang, Lili
Yang, Diyi
Vosoughi, Soroush
2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 11328 - 11345
[27] Natural Attack for Pre-trained Models of Code
Yang, Zhou
Shi, Jieke
He, Junda
Lo, David
2022 ACM/IEEE 44TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2022), 2022, : 1482 - 1493
[28] Generalization of vision pre-trained models for histopathology
Milad Sikaroudi
Maryam Hosseini
Ricardo Gonzalez
Shahryar Rahnamayan
H. R. Tizhoosh
Scientific Reports, 13
[29] Knowledge Rumination for Pre-trained Language Models
Yao, Yunzhi
Wang, Peng
Mao, Shengyu
Tan, Chuanqi
Huang, Fei
Chen, Huajun
Zhang, Ningyu
2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 3387 - 3404
[30] Pre-trained models: Past, present and future
Han, Xu
Zhang, Zhengyan
Ding, Ning
Gu, Yuxian
Liu, Xiao
Huo, Yuqi
Qiu, Jiezhong
Yao, Yuan
Zhang, Ao
Zhang, Liang
Han, Wentao
Huang, Minlie
Jin, Qin
Lan, Yanyan
Liu, Yang
Liu, Zhiyuan
Lu, Zhiwu
Qiu, Xipeng
Song, Ruihua
Tang, Jie
Wen, Ji-Rong
Yuan, Jinhui
Zhao, Wayne Xin
Zhu, Jun
AI OPEN, 2021, 2 : 225 - 250

← 1 2 3 4 5 →