Improving Pretrained Language Model Fine-Tuning With Noise Stability Regularization

被引:2
|
作者
Hua, Hang [1 ]
Li, Xingjian [2 ]
Dou, Dejing [3 ]
Xu, Cheng-Zhong [4 ]
Luo, Jiebo [1 ]
机构
[1] Univ Rochester, Dept Comp Sci, Rochester, NY 14627 USA
[2] Carnegie Mellon Univ, Computat Biol Dept, Pittsburgh, PA 15213 USA
[3] BCG Greater China, Beijing 100027, Peoples R China
[4] Univ Macau, State Key Lab IOTSC, Fac Sci & Technol, Macau, Peoples R China
关键词
Stability analysis; Task analysis; Training; Transformers; Gaussian distribution; Standards; Optimization; Domain generalization; fine-tuning; in-domain generalization; pretrained language models (PLMs); regularization; NEURAL-NETWORKS;
D O I
10.1109/TNNLS.2023.3330926
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The advent of large-scale pretrained language models (PLMs) has contributed greatly to the progress in natural language processing (NLP). Despite its recent success and wide adoption, fine-tuning a PLM often suffers from overfitting, which leads to poor generalizability due to the extremely high complexity of the model and the limited training samples from downstream tasks. To address this problem, we propose a novel and effective fine-tuning framework, named layerwise noise stability regularization (LNSR). Specifically, our method perturbs the input of neural networks with the standard Gaussian or in-manifold noise in the representation space and regularizes each layer's output of the language model. We provide theoretical and experimental analyses to prove the effectiveness of our method. The empirical results show that our proposed method outperforms several state-of-the-art algorithms, such as L2 norm and start point (L2-SP), Mixout, FreeLB, and smoothness inducing adversarial regularization and Bregman proximal point optimization (SMART). In addition to evaluating the proposed method on relatively simple text classification tasks, similar to the prior works, we further evaluate the effectiveness of our method on more challenging question-answering (QA) tasks. These tasks present a higher level of difficulty, and they provide a larger amount of training examples for tuning a well-generalized model. Furthermore, the empirical results indicate that our proposed method can improve the ability of language models to domain generalization.
引用
收藏
页码:1898 / 1910
页数:13
相关论文
共 50 条
  • [41] A Rigorous Study on Named Entity Recognition: Can Fine-tuning Pretrained Model Lead to the Promised Land?
    Lin, Hongyu
    Lu, Yaojie
    Tang, Jialong
    Han, Xianpei
    Sun, Le
    Wei, Zhicheng
    Yuan, Nicholas Jing
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 7291 - 7300
  • [42] Rebetiko Singer Identification: Fine-tuning and explaining deep pretrained transformer models
    Papakostas, Maximos Kaliakatsos
    Zacharakis, Asterios
    Velenis, Konstantinos
    Cambouropoulos, Emilios
    PROCEEDINGS OF THE 19TH INTERNATIONAL AUDIO MOSTLY CONFERENCE, AM 2024, 2024, : 285 - 291
  • [43] Fine-Tuning and the Stability of Recurrent Neural Networks
    MacNeil, David
    Eliasmith, Chris
    PLOS ONE, 2011, 6 (09):
  • [44] Guided Recommendation for Model Fine-Tuning
    Li, Hao
    Fowlkes, Charless
    Yang, Hao
    Dabeer, Onkar
    Tu, Zhuowen
    Soatto, Stefano
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 3633 - 3642
  • [45] Model Editing by Standard Fine-Tuning
    Gangadhar, Govind
    Stratos, Karl
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 5907 - 5913
  • [46] Balancing Speciality and Versatility: a Coarse to Fine Framework for Supervised Fine-tuning Large Language Model
    Zhang, Hengyuan
    Wu, Yanru
    Li, Dawei
    Yang, Sak
    Zhao, Rui
    Jiang, Yong
    Tan, Fei
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 7467 - 7509
  • [47] Fine-Tuning a Large Language Model with Reinforcement Learning for Educational Question Generation
    Lamsiyah, Salima
    El Mahdaouy, Abdelkader
    Nourbakhsh, Aria
    Schommer, Christoph
    ARTIFICIAL INTELLIGENCE IN EDUCATION, PT I, AIED 2024, 2024, 14829 : 424 - 438
  • [48] WalkLM: A Uniform Language Model Fine-tuning Framework for Attributed Graph Embedding
    Tan, Yanchao
    Zhou, Zihao
    Lv, Hang
    Liu, Weiming
    Yang, Carl
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [49] Efficient fine-tuning of short text classification based on large language model
    Wang, Likun
    PROCEEDINGS OF INTERNATIONAL CONFERENCE ON MODELING, NATURAL LANGUAGE PROCESSING AND MACHINE LEARNING, CMNM 2024, 2024, : 33 - 38
  • [50] Raise a Child in Large Language Model: Towards Effective and Generalizable Fine-tuning
    Xu, Runxin
    Luo, Fuli
    Zhang, Zhiyuan
    Tan, Chuanqi
    Chang, Baobao
    Huang, Songfang
    Huang, Fei
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 9514 - 9528