Domain Knowledge Transferring for Pre-trained Language Model via Calibrated Activation Boundary Distillation

被引:0
|
作者
Choi, Dongha [1 ]
Choi, HongSeok [2 ]
Lee, Hyunju [1 ,2 ]
机构
[1] Gwangju Inst Sci & Technol, Artificial Intelligence Grad Sch, Gwangju 61005, South Korea
[2] Gwangju Inst Sci & Technol, Sch Elect Engn & Comp Sci, Gwangju 61005, South Korea
基金
新加坡国家研究基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Since the development and wide use of pre-trained language models (PLMs), several approaches have been applied to boost their performance on downstream tasks in specific domains, such as biomedical or scientific domains. Additional pre-training with in-domain texts is the most common approach for providing domain-specific knowledge to PLMs. However, these pre-training methods require considerable in-domain data and training resources and a longer training time. Moreover, the training must be re-performed whenever a new PLM emerges. In this study, we propose a domain knowledge transferring (DoKTra) framework for PLMs without additional in-domain pretraining. Specifically, we extract the domain knowledge from an existing in-domain pre-trained language model and transfer it to other PLMs by applying knowledge distillation. In particular, we employ activation boundary distillation, which focuses on the activation of hidden neurons. We also apply an entropy regularization term in both teacher training and distillation to encourage the model to generate reliable output probabilities, and thus aid the distillation. By applying the proposed DoKTra framework to downstream tasks in the biomedical, clinical, and financial domains, our student models can retain a high percentage of teacher performance and even outperform the teachers in certain tasks.
引用
收藏
页码:1658 / 1669
页数:12
相关论文
共 50 条
  • [1] Knowledge Base Grounded Pre-trained Language Models via Distillation
    Sourty, Raphael
    Moreno, Jose G.
    Servant, Francois-Paul
    Tamine, Lynda
    39TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2024, 2024, : 1617 - 1625
  • [2] Explanation Guided Knowledge Distillation for Pre-trained Language Model Compression
    Yang, Zhao
    Zhang, Yuanzhe
    Sui, Dianbo
    Ju, Yiming
    Zhao, Jun
    Liu, Kang
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2024, 23 (02)
  • [3] Dynamic Knowledge Distillation for Pre-trained Language Models
    Li, Lei
    Lin, Yankai
    Ren, Shuhuai
    Li, Peng
    Zhou, Jie
    Sun, Xu
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 379 - 389
  • [4] Towards Efficient Pre-Trained Language Model via Feature Correlation Distillation
    Huang, Kun
    Guo, Xin
    Wang, Meng
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [5] AdaDS: Adaptive data selection for accelerating pre-trained language model knowledge distillation
    Zhou, Qinhong
    Li, Peng
    Liu, Yang
    Guan, Yuyang
    Xing, Qizhou
    Chen, Ming
    Sun, Maosong
    Liu, Yang
    AI OPEN, 2023, 4 : 56 - 63
  • [6] ReAugKD: Retrieval-Augmented Knowledge Distillation For Pre-trained Language Models
    Zhang, Jianyi
    Muhamed, Aashiq
    Anantharaman, Aditya
    Wang, Guoyin
    Chen, Changyou
    Zhong, Kai
    Cui, Qingjun
    Xu, Yi
    Zeng, Belinda
    Chilimbi, Trishul
    Chen, Yiran
    61ST CONFERENCE OF THE THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 2, 2023, : 1128 - 1136
  • [7] IMPROVING CTC-BASED SPEECH RECOGNITION VIA KNOWLEDGE TRANSFERRING FROM PRE-TRAINED LANGUAGE MODELS
    Deng, Keqi
    Cao, Songjun
    Zhang, Yike
    Ma, Long
    Cheng, Gaofeng
    Xu, Ji
    Zhang, Pengyuan
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8517 - 8521
  • [8] Grammatical Error Correction by Transferring Learning Based on Pre-Trained Language Model
    Han M.
    Wang Y.
    Shanghai Jiaotong Daxue Xuebao/Journal of Shanghai Jiaotong University, 2022, 56 (11): : 1554 - 1560
  • [9] Pre-trained language models with domain knowledge for biomedical extractive summarization
    Xie Q.
    Bishop J.A.
    Tiwari P.
    Ananiadou S.
    Knowledge-Based Systems, 2022, 252
  • [10] Knowledge Enhanced Pre-trained Language Model for Product Summarization
    Yin, Wenbo
    Ren, Junxiang
    Wu, Yuejiao
    Song, Ruilin
    Liu, Lang
    Cheng, Zhen
    Wang, Sibo
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2022, PT II, 2022, 13552 : 263 - 273