Universal embedding for pre-trained models and data bench

被引：0

作者：

Cho, Namkyeong ^{[1
]}

Cho, Taewon ^{[2
]}

Shin, Jaesun ^{[2
]}

Jeon, Eunjoo ^{[2
]}

Lee, Taehee ^{[2
]}

机构：

[1] Pohang Univ Sci & Technol POSTECH, Ctr Math Machine Learning & its Applicat CM2LA, Dept Math, Pohang 37673, Gyeongbuk, South Korea

[2] Samsung SDS, 125 Olymp Ro 35 Gil, Seoul 05510, South Korea

来源：

NEUROCOMPUTING | 2025年 / 619卷

基金：

新加坡国家研究基金会;

关键词：

Transfer learning; Pretrained models; Graph neural networks;

D O I：

10.1016/j.neucom.2024.129107

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The transformer architecture has shown significant improvements in the performance of various natural language processing (NLP) tasks. One of the great advantages of transformer-based model is that they allow for the addition of an extra layer to a pre-trained model (PTM) and fine-tuning, rather than requiring the development of a separate architecture for each task. This approach has provided great promising performance in NLP tasks. Therefore, selecting an appropriate PTM from the model zoo, such as Hugging Face, becomes a crucial task. Despite the importance of PTM selection, it still requires further investigation. The main challenge in PTM selection for NLP tasks is the lack of a publicly available benchmark to evaluate model performance for each task and dataset. To address this challenge, we introduce the first public data benchmark to evaluate the performance of popular transformer-based models on diverse ranges of NLP tasks. Furthermore, we propose graph representations of transformer-based models with node features that represent the matrix weight on each layer. Empirical results demonstrate that our proposed graph neural network (GNN) model outperforms existing PTM selection methods.

引用

页数：21

共 50 条

[1] UOR: Universal Backdoor Attacks on Pre-trained Language Models
Du, Wei
Li, Peixuan
Zhao, Haodong
Ju, Tianjie
Ren, Ge
Liu, Gongshen
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 7865 - 7877
[2] Universal Adversarial Perturbations for Vision-Language Pre-trained Models
Zhang, Peng-Fei
Huang, Zi
Bai, Guangdong
PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 862 - 871
[3] Detection of Speech Related Disorders by Pre-trained Embedding Models Extracted Biomarkers
Jenei, Attila Zoltan
Kiss, Gabor
Sztaho, David
SPEECH AND COMPUTER, SPECOM 2022, 2022, 13721 : 279 - 289
[4] Natural language generation from Universal Dependencies using data augmentation and pre-trained language models
Nguyen D.T.
Tran T.
International Journal of Intelligent Information and Database Systems, 2023, 16 (01) : 89 - 105
[5] A Data Cartography based MixUp for Pre-trained Language Models
Park, Seo Yeon
Caragea, Cornelia
NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 4244 - 4250
[6] Pre-trained Language Models with Limited Data for Intent Classification
Kasthuriarachchy, Buddhika
Chetty, Madhu
Karmakar, Gour
Walls, Darren
2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
[7] Comparison of Pre-trained vs Custom- trained Word Embedding Models for Word Sense Disambiguation
Ullah, Muhammad Farhat
Saeed, Ali
Hussain, Naveed
ADCAIJ-ADVANCES IN DISTRIBUTED COMPUTING AND ARTIFICIAL INTELLIGENCE JOURNAL, 2023, 12 (01):
[8] Refining Pre-Trained Motion Models
Sun, Xinglong
Harley, Adam W.
Guibas, Leonidas J.
2024 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA 2024, 2024, : 4932 - 4938
[9] Efficiently Robustify Pre-Trained Models
Jain, Nishant
Behl, Harkirat
Rawat, Yogesh Singh
Vineet, Vibhav
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 5482 - 5492
[10] Pre-trained Models for Sonar Images
Valdenegro-Toro, Matias
Preciado-Grijalva, Alan
Wehbe, Bilal
OCEANS 2021: SAN DIEGO - PORTO, 2021,

← 1 2 3 4 5 →