Improving Pretrained Language Model Fine-Tuning With Noise Stability Regularization

被引:2
|
作者
Hua, Hang [1 ]
Li, Xingjian [2 ]
Dou, Dejing [3 ]
Xu, Cheng-Zhong [4 ]
Luo, Jiebo [1 ]
机构
[1] Univ Rochester, Dept Comp Sci, Rochester, NY 14627 USA
[2] Carnegie Mellon Univ, Computat Biol Dept, Pittsburgh, PA 15213 USA
[3] BCG Greater China, Beijing 100027, Peoples R China
[4] Univ Macau, State Key Lab IOTSC, Fac Sci & Technol, Macau, Peoples R China
关键词
Stability analysis; Task analysis; Training; Transformers; Gaussian distribution; Standards; Optimization; Domain generalization; fine-tuning; in-domain generalization; pretrained language models (PLMs); regularization; NEURAL-NETWORKS;
D O I
10.1109/TNNLS.2023.3330926
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The advent of large-scale pretrained language models (PLMs) has contributed greatly to the progress in natural language processing (NLP). Despite its recent success and wide adoption, fine-tuning a PLM often suffers from overfitting, which leads to poor generalizability due to the extremely high complexity of the model and the limited training samples from downstream tasks. To address this problem, we propose a novel and effective fine-tuning framework, named layerwise noise stability regularization (LNSR). Specifically, our method perturbs the input of neural networks with the standard Gaussian or in-manifold noise in the representation space and regularizes each layer's output of the language model. We provide theoretical and experimental analyses to prove the effectiveness of our method. The empirical results show that our proposed method outperforms several state-of-the-art algorithms, such as L2 norm and start point (L2-SP), Mixout, FreeLB, and smoothness inducing adversarial regularization and Bregman proximal point optimization (SMART). In addition to evaluating the proposed method on relatively simple text classification tasks, similar to the prior works, we further evaluate the effectiveness of our method on more challenging question-answering (QA) tasks. These tasks present a higher level of difficulty, and they provide a larger amount of training examples for tuning a well-generalized model. Furthermore, the empirical results indicate that our proposed method can improve the ability of language models to domain generalization.
引用
收藏
页码:1898 / 1910
页数:13
相关论文
共 50 条
  • [31] DN at SemEval-2023 Task 12: Low-Resource Language Text Classification via Multilingual Pretrained Language Model Fine-tuning
    Daniil, Homskiy
    Narek, Maloyan
    17TH INTERNATIONAL WORKSHOP ON SEMANTIC EVALUATION, SEMEVAL-2023, 2023, : 1537 - 1541
  • [32] Multi-phase Fine-Tuning: A New Fine-Tuning Approach for Sign Language Recognition
    Sarhan, Noha
    Lauri, Mikko
    Frintrop, Simone
    KUNSTLICHE INTELLIGENZ, 2022, 36 (01): : 91 - 98
  • [33] Multi-phase Fine-Tuning: A New Fine-Tuning Approach for Sign Language Recognition
    Noha Sarhan
    Mikko Lauri
    Simone Frintrop
    KI - Künstliche Intelligenz, 2022, 36 : 91 - 98
  • [34] ACTUNE: Uncertainty-Based Active Self-Training for Active Fine-Tuning of Pretrained Language Models
    Yu, Yue
    Kong, Lingkai
    Zhang, Jieyu
    Zhang, Rongzhi
    Zhang, Chao
    NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 1422 - 1436
  • [35] Causal-Debias: Unifying Debiasing in Pretrained Language Models and Fine-tuning via Causal Invariant Learning
    Zhou, Fan
    Mao, Yuzhou
    Yu, Liu
    Yang, Yi
    Zhong, Ting
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 4227 - 4241
  • [36] Leveraging Pretrained Language Models for Enhanced Entity Matching: A Comprehensive Study of Fine-Tuning and Prompt Learning Paradigms
    Wang, Yu
    Zhou, Luyao
    Wang, Yuan
    Peng, Zhenwan
    INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2024, 2024
  • [37] Towards Adaptive Prefix Tuning for Parameter-Efficient Language Model Fine-tuning
    Zhang, Zhen-Ru
    Tan, Chuanqi
    Xu, Haiyang
    Wang, Chengyu
    Huang, Jun
    Huang, Songfang
    61ST CONFERENCE OF THE THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 2, 2023, : 1239 - 1248
  • [38] MultiFiT: Efficient Multi-lingual Language Model Fine-tuning
    Eisenschlos, Julian
    Ruder, Sebastian
    Czapla, Piotr
    Kardas, Marcin
    Gugger, Sylvain
    Howard, Jeremy
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 5702 - 5707
  • [39] Improving fine-tuning in composite Higgs models
    Banerjee, Avik
    Bhattacharyya, Gautam
    Ray, Tirtha Sankar
    PHYSICAL REVIEW D, 2017, 96 (03)
  • [40] MediBioDeBERTa: Biomedical Language Model With Continuous Learning and Intermediate Fine-Tuning
    Kim, Eunhui
    Jeong, Yuna
    Choi, Myung-Seok
    IEEE ACCESS, 2023, 11 : 141036 - 141044