Improving Pretrained Language Model Fine-Tuning With Noise Stability Regularization

被引：2

作者：

Hua, Hang ^{[1
]}

Li, Xingjian ^{[2
]}

Dou, Dejing ^{[3
]}

Xu, Cheng-Zhong ^{[4
]}

Luo, Jiebo ^{[1
]}

机构：

[1] Univ Rochester, Dept Comp Sci, Rochester, NY 14627 USA

[2] Carnegie Mellon Univ, Computat Biol Dept, Pittsburgh, PA 15213 USA

[3] BCG Greater China, Beijing 100027, Peoples R China

[4] Univ Macau, State Key Lab IOTSC, Fac Sci & Technol, Macau, Peoples R China

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2025年 / 36卷 / 01期

关键词：

Stability analysis; Task analysis; Training; Transformers; Gaussian distribution; Standards; Optimization; Domain generalization; fine-tuning; in-domain generalization; pretrained language models (PLMs); regularization; NEURAL-NETWORKS;

D O I：

10.1109/TNNLS.2023.3330926

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The advent of large-scale pretrained language models (PLMs) has contributed greatly to the progress in natural language processing (NLP). Despite its recent success and wide adoption, fine-tuning a PLM often suffers from overfitting, which leads to poor generalizability due to the extremely high complexity of the model and the limited training samples from downstream tasks. To address this problem, we propose a novel and effective fine-tuning framework, named layerwise noise stability regularization (LNSR). Specifically, our method perturbs the input of neural networks with the standard Gaussian or in-manifold noise in the representation space and regularizes each layer's output of the language model. We provide theoretical and experimental analyses to prove the effectiveness of our method. The empirical results show that our proposed method outperforms several state-of-the-art algorithms, such as L2 norm and start point (L2-SP), Mixout, FreeLB, and smoothness inducing adversarial regularization and Bregman proximal point optimization (SMART). In addition to evaluating the proposed method on relatively simple text classification tasks, similar to the prior works, we further evaluate the effectiveness of our method on more challenging question-answering (QA) tasks. These tasks present a higher level of difficulty, and they provide a larger amount of training examples for tuning a well-generalized model. Furthermore, the empirical results indicate that our proposed method can improve the ability of language models to domain generalization.

引用

页码：1898 / 1910

页数：13

共 50 条

[41] A Rigorous Study on Named Entity Recognition: Can Fine-tuning Pretrained Model Lead to the Promised Land?
Lin, Hongyu
Lu, Yaojie
Tang, Jialong
Han, Xianpei
Sun, Le
Wei, Zhicheng
Yuan, Nicholas Jing
PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 7291 - 7300
[42] Rebetiko Singer Identification: Fine-tuning and explaining deep pretrained transformer models
Papakostas, Maximos Kaliakatsos
Zacharakis, Asterios
Velenis, Konstantinos
Cambouropoulos, Emilios
PROCEEDINGS OF THE 19TH INTERNATIONAL AUDIO MOSTLY CONFERENCE, AM 2024, 2024, : 285 - 291
[43] Fine-Tuning and the Stability of Recurrent Neural Networks
MacNeil, David
Eliasmith, Chris
PLOS ONE, 2011, 6 (09):
[44] Guided Recommendation for Model Fine-Tuning
Li, Hao
Fowlkes, Charless
Yang, Hao
Dabeer, Onkar
Tu, Zhuowen
Soatto, Stefano
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 3633 - 3642
[45] Model Editing by Standard Fine-Tuning
Gangadhar, Govind
Stratos, Karl
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 5907 - 5913
[46] Balancing Speciality and Versatility: a Coarse to Fine Framework for Supervised Fine-tuning Large Language Model
Zhang, Hengyuan
Wu, Yanru
Li, Dawei
Yang, Sak
Zhao, Rui
Jiang, Yong
Tan, Fei
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 7467 - 7509
[47] Fine-Tuning a Large Language Model with Reinforcement Learning for Educational Question Generation
Lamsiyah, Salima
El Mahdaouy, Abdelkader
Nourbakhsh, Aria
Schommer, Christoph
ARTIFICIAL INTELLIGENCE IN EDUCATION, PT I, AIED 2024, 2024, 14829 : 424 - 438
[48] WalkLM: A Uniform Language Model Fine-tuning Framework for Attributed Graph Embedding
Tan, Yanchao
Zhou, Zihao
Lv, Hang
Liu, Weiming
Yang, Carl
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[49] Efficient fine-tuning of short text classification based on large language model
Wang, Likun
PROCEEDINGS OF INTERNATIONAL CONFERENCE ON MODELING, NATURAL LANGUAGE PROCESSING AND MACHINE LEARNING, CMNM 2024, 2024, : 33 - 38
[50] Raise a Child in Large Language Model: Towards Effective and Generalizable Fine-tuning
Xu, Runxin
Luo, Fuli
Zhang, Zhiyuan
Tan, Chuanqi
Chang, Baobao
Huang, Songfang
Huang, Fei
2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 9514 - 9528

← 1 2 3 4 5 →