Domain Knowledge Transferring for Pre-trained Language Model via Calibrated Activation Boundary Distillation

被引:0
|
作者
Choi, Dongha [1 ]
Choi, HongSeok [2 ]
Lee, Hyunju [1 ,2 ]
机构
[1] Gwangju Inst Sci & Technol, Artificial Intelligence Grad Sch, Gwangju 61005, South Korea
[2] Gwangju Inst Sci & Technol, Sch Elect Engn & Comp Sci, Gwangju 61005, South Korea
基金
新加坡国家研究基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Since the development and wide use of pre-trained language models (PLMs), several approaches have been applied to boost their performance on downstream tasks in specific domains, such as biomedical or scientific domains. Additional pre-training with in-domain texts is the most common approach for providing domain-specific knowledge to PLMs. However, these pre-training methods require considerable in-domain data and training resources and a longer training time. Moreover, the training must be re-performed whenever a new PLM emerges. In this study, we propose a domain knowledge transferring (DoKTra) framework for PLMs without additional in-domain pretraining. Specifically, we extract the domain knowledge from an existing in-domain pre-trained language model and transfer it to other PLMs by applying knowledge distillation. In particular, we employ activation boundary distillation, which focuses on the activation of hidden neurons. We also apply an entropy regularization term in both teacher training and distillation to encourage the model to generate reliable output probabilities, and thus aid the distillation. By applying the proposed DoKTra framework to downstream tasks in the biomedical, clinical, and financial domains, our student models can retain a high percentage of teacher performance and even outperform the teachers in certain tasks.
引用
收藏
页码:1658 / 1669
页数:12
相关论文
共 50 条
  • [21] MERGEDISTILL: Merging Pre-trained Language Models using Distillation
    Khanuja, Simran
    Johnson, Melvin
    Talukdar, Partha
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 2874 - 2887
  • [22] CascadeBERT: Accelerating Inference of Pre-trained Language Models via Calibrated Complete Models Cascade
    Li, Lei
    Lin, Yankai
    Chen, Deli
    Ren, Shuhuai
    Li, Peng
    Zhou, Jie
    Sun, Xu
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 475 - 486
  • [23] Knowledge Transfer from Pre-trained Language Models to Cif-based Speech Recognizers via Hierarchical Distillation
    Han, Minglun
    Chen, Feilong
    Shi, Jing
    Xu, Shuang
    Xu, Bo
    INTERSPEECH 2023, 2023, : 1364 - 1368
  • [24] A Study on Knowledge Distillation from Weak Teacher for Scaling Up Pre-trained Language Models
    Lee, Hayeon
    Hon, Rui
    Kim, Jongpil
    Liang, Davis
    Hwang, Sung Ju
    Min, Alexander
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 11239 - 11246
  • [25] KroneckerBERT: Significant Compression of Pre-trained Language Models Through Kronecker Decomposition and Knowledge Distillation
    Tahaei, Marzieh S.
    Charlaix, Ella
    Nia, Vahid Partovi
    Ghodsi, Ali
    Rezagholizadeh, Mehdi
    NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 2116 - 2127
  • [26] Pre-trained Language Model Representations for Language Generation
    Edunov, Sergey
    Baevski, Alexei
    Auli, Michael
    2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 4052 - 4059
  • [27] One Teacher is Enough? Pre-trained Language Model Distillation from Multiple Teachers
    Wu, Chuhan
    Wu, Fangzhao
    Huang, Yongfeng
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 4408 - 4413
  • [28] Classifying Code Comments via Pre-trained Programming Language Model
    Li, Ying
    Wang, Haibo
    Zhang, Huaien
    Tan, Shin Hwei
    2023 IEEE/ACM 2ND INTERNATIONAL WORKSHOP ON NATURAL LANGUAGE-BASED SOFTWARE ENGINEERING, NLBSE, 2023, : 24 - 27
  • [29] Development of a baseline model for MAX/MXene synthesis recipes extraction via pre-trained model with domain knowledge
    Zhao, Meiting
    Wu, Erxiao
    Li, Dongyang
    Luo, Junfei
    Zhang, Xin
    Wang, Zhuquan
    Huang, Qing
    Du, Shiyu
    Zhang, Yiming
    JOURNAL OF MATERIALS RESEARCH AND TECHNOLOGY-JMR&T, 2023, 22 : 2262 - 2274
  • [30] Probing Pre-Trained Language Models for Disease Knowledge
    Alghanmi, Israa
    Espinosa-Anke, Luis
    Schockaert, Steven
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 3023 - 3033