Domain Knowledge Transferring for Pre-trained Language Model via Calibrated Activation Boundary Distillation

被引:0
|
作者
Choi, Dongha [1 ]
Choi, HongSeok [2 ]
Lee, Hyunju [1 ,2 ]
机构
[1] Gwangju Inst Sci & Technol, Artificial Intelligence Grad Sch, Gwangju 61005, South Korea
[2] Gwangju Inst Sci & Technol, Sch Elect Engn & Comp Sci, Gwangju 61005, South Korea
基金
新加坡国家研究基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Since the development and wide use of pre-trained language models (PLMs), several approaches have been applied to boost their performance on downstream tasks in specific domains, such as biomedical or scientific domains. Additional pre-training with in-domain texts is the most common approach for providing domain-specific knowledge to PLMs. However, these pre-training methods require considerable in-domain data and training resources and a longer training time. Moreover, the training must be re-performed whenever a new PLM emerges. In this study, we propose a domain knowledge transferring (DoKTra) framework for PLMs without additional in-domain pretraining. Specifically, we extract the domain knowledge from an existing in-domain pre-trained language model and transfer it to other PLMs by applying knowledge distillation. In particular, we employ activation boundary distillation, which focuses on the activation of hidden neurons. We also apply an entropy regularization term in both teacher training and distillation to encourage the model to generate reliable output probabilities, and thus aid the distillation. By applying the proposed DoKTra framework to downstream tasks in the biomedical, clinical, and financial domains, our student models can retain a high percentage of teacher performance and even outperform the teachers in certain tasks.
引用
收藏
页码:1658 / 1669
页数:12
相关论文
共 50 条
  • [31] A Survey of Knowledge Enhanced Pre-Trained Language Models
    Hu, Linmei
    Liu, Zeyi
    Zhao, Ziwang
    Hou, Lei
    Nie, Liqiang
    Li, Juanzi
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (04) : 1413 - 1430
  • [32] Commonsense Knowledge Transfer for Pre-trained Language Models
    Zhou, Wangchunshu
    Le Bras, Ronan
    Choi, Yejin
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 5946 - 5960
  • [33] A Pre-trained Language Model for Medical Question Answering Based on Domain Adaption
    Liu, Lang
    Ren, Junxiang
    Wu, Yuejiao
    Song, Ruilin
    Cheng, Zhen
    Wang, Sibo
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2022, PT II, 2022, 13552 : 216 - 227
  • [34] Adder Encoder for Pre-trained Language Model
    Ding, Jianbang
    Zhang, Suiyun
    Li, Linlin
    CHINESE COMPUTATIONAL LINGUISTICS, CCL 2023, 2023, 14232 : 339 - 347
  • [35] TPUF: Enhancing Cross-domain Sequential Recommendation via Transferring Pre-trained User Features
    Ding, Yujia
    Li, Huan
    Chen, Ke
    Shou, Lidan
    PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023, 2023, : 410 - 419
  • [36] Explainable reasoning over temporal knowledge graphs by pre-trained language model
    Li, Qing
    Wu, Guanzhong
    INFORMATION PROCESSING & MANAGEMENT, 2025, 62 (01)
  • [37] Prompting disentangled embeddings for knowledge graph completion with pre-trained language model
    Geng, Yuxia
    Chen, Jiaoyan
    Zeng, Yuhang
    Chen, Zhuo
    Zhang, Wen
    Pan, Jeff Z.
    Wang, Yuxiang
    Xu, Xiaoliang
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 268
  • [38] NMT Enhancement based on Knowledge Graph Mining with Pre-trained Language Model
    Yang, Hao
    Qin, Ying
    Deng, Yao
    Wang, Minghan
    2020 22ND INTERNATIONAL CONFERENCE ON ADVANCED COMMUNICATION TECHNOLOGY (ICACT): DIGITAL SECURITY GLOBAL AGENDA FOR SAFE SOCIETY!, 2020, : 185 - 189
  • [39] Mitigating Backdoor Attacks in Pre-Trained Encoders via Self-Supervised Knowledge Distillation
    Bie, Rongfang
    Jiang, Jinxiu
    Xie, Hongcheng
    Guo, Yu
    Miao, Yinbin
    Jia, Xiaohua
    IEEE TRANSACTIONS ON SERVICES COMPUTING, 2024, 17 (05) : 2613 - 2625
  • [40] Mixture-of-Domain-Adapters: Decoupling and Injecting Domain Knowledge to Pre-trained Language Models' Memories
    Diao, Shizhe
    Xu, Tianyang
    Xu, Ruijia
    Wang, Jiawei
    Zhang, Tong
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 5113 - 5129