Universal embedding for pre-trained models and data bench

被引：0

作者：

Cho, Namkyeong ^{[1
]}

Cho, Taewon ^{[2
]}

Shin, Jaesun ^{[2
]}

Jeon, Eunjoo ^{[2
]}

Lee, Taehee ^{[2
]}

机构：

[1] Pohang Univ Sci & Technol POSTECH, Ctr Math Machine Learning & its Applicat CM2LA, Dept Math, Pohang 37673, Gyeongbuk, South Korea

[2] Samsung SDS, 125 Olymp Ro 35 Gil, Seoul 05510, South Korea

来源：

NEUROCOMPUTING | 2025年 / 619卷

基金：

新加坡国家研究基金会;

关键词：

Transfer learning; Pretrained models; Graph neural networks;

D O I：

10.1016/j.neucom.2024.129107

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The transformer architecture has shown significant improvements in the performance of various natural language processing (NLP) tasks. One of the great advantages of transformer-based model is that they allow for the addition of an extra layer to a pre-trained model (PTM) and fine-tuning, rather than requiring the development of a separate architecture for each task. This approach has provided great promising performance in NLP tasks. Therefore, selecting an appropriate PTM from the model zoo, such as Hugging Face, becomes a crucial task. Despite the importance of PTM selection, it still requires further investigation. The main challenge in PTM selection for NLP tasks is the lack of a publicly available benchmark to evaluate model performance for each task and dataset. To address this challenge, we introduce the first public data benchmark to evaluate the performance of popular transformer-based models on diverse ranges of NLP tasks. Furthermore, we propose graph representations of transformer-based models with node features that represent the matrix weight on each layer. Empirical results demonstrate that our proposed graph neural network (GNN) model outperforms existing PTM selection methods.

引用

页数：21

共 50 条

[31] Deciphering Stereotypes in Pre-Trained Language Models
Ma, Weicheng
Scheible, Henry
Wang, Brian
Veeramachaneni, Goutham
Chowdhary, Pratim
Sung, Alan
Koulogeorge, Andrew
Wang, Lili
Yang, Diyi
Vosoughi, Soroush
2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 11328 - 11345
[32] PhoBERT: Pre-trained language models for Vietnamese
Dat Quoc Nguyen
Anh Tuan Nguyen
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 1037 - 1042
[33] Learning to Modulate pre-trained Models in RL
Schmied, Thomas
Hofmarcher, Markus
Paischer, Fabian
Pascanu, Razvan
Hochreiter, Sepp
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[34] Deep Compression of Pre-trained Transformer Models
Wang, Naigang
Liu, Chi-Chun
Venkataramani, Swagath
Sen, Sanchari
Chen, Chia-Yu
El Maghraoui, Kaoutar
Srinivasan, Vijayalakshmi
Chang, Leland
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[35] HinPLMs: Pre-trained Language Models for Hindi
Huang, Xixuan
Lin, Nankai
Li, Kexin
Wang, Lianxi
Gan, Suifu
2021 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2021, : 241 - 246
[36] Evaluating Commonsense in Pre-Trained Language Models
Zhou, Xuhui
Zhang, Yue
Cui, Leyang
Huang, Dandan
THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 9733 - 9740
[37] Semantic Programming by Example with Pre-trained Models
Verbruggen, Gust
Le, Vu
Gulwani, Sumit
PROCEEDINGS OF THE ACM ON PROGRAMMING LANGUAGES-PACMPL, 2021, 5 (OOPSLA):
[38] Aliasing Backdoor Attacks on Pre-trained Models
Wei, Cheng'an
Lee, Yeonjoon
Chen, Kai
Meng, Guozhu
Lv, Peizhuo
PROCEEDINGS OF THE 32ND USENIX SECURITY SYMPOSIUM, 2023, : 2707 - 2724
[39] Knowledge Inheritance for Pre-trained Language Models
Qin, Yujia
Lin, Yankai
Yi, Jing
Zhang, Jiajie
Han, Xu
Zhang, Zhengyan
Su, Yusheng
Liu, Zhiyuan
Li, Peng
Sun, Maosong
Zhou, Jie
NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 3921 - 3937
[40] Continual Learning with Pre-Trained Models: A Survey
Zhou, Da-Wei
Sun, Hai-Long
Ning, Jingyi
Ye, Han-Jia
Zhan, De-Chuan
PROCEEDINGS OF THE THIRTY-THIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2024, 2024, : 8363 - 8371

← 1 2 3 4 5 →