Domain Knowledge Transferring for Pre-trained Language Model via Calibrated Activation Boundary Distillation

被引:0
|
作者
Choi, Dongha [1 ]
Choi, HongSeok [2 ]
Lee, Hyunju [1 ,2 ]
机构
[1] Gwangju Inst Sci & Technol, Artificial Intelligence Grad Sch, Gwangju 61005, South Korea
[2] Gwangju Inst Sci & Technol, Sch Elect Engn & Comp Sci, Gwangju 61005, South Korea
基金
新加坡国家研究基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Since the development and wide use of pre-trained language models (PLMs), several approaches have been applied to boost their performance on downstream tasks in specific domains, such as biomedical or scientific domains. Additional pre-training with in-domain texts is the most common approach for providing domain-specific knowledge to PLMs. However, these pre-training methods require considerable in-domain data and training resources and a longer training time. Moreover, the training must be re-performed whenever a new PLM emerges. In this study, we propose a domain knowledge transferring (DoKTra) framework for PLMs without additional in-domain pretraining. Specifically, we extract the domain knowledge from an existing in-domain pre-trained language model and transfer it to other PLMs by applying knowledge distillation. In particular, we employ activation boundary distillation, which focuses on the activation of hidden neurons. We also apply an entropy regularization term in both teacher training and distillation to encourage the model to generate reliable output probabilities, and thus aid the distillation. By applying the proposed DoKTra framework to downstream tasks in the biomedical, clinical, and financial domains, our student models can retain a high percentage of teacher performance and even outperform the teachers in certain tasks.
引用
收藏
页码:1658 / 1669
页数:12
相关论文
共 50 条
  • [41] CLIPTrans: Transferring Visual Knowledge with Pre-trained Models for Multimodal Machine Translation
    Gupta, Devaansh
    Kharbanda, Siddhant
    Zhou, Jiawei
    Li, Wanhua
    Pfister, Hanspeter
    Wei, Donglai
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 2863 - 2874
  • [42] DKPLM: Decomposable Knowledge-Enhanced Pre-trained Language Model for Natural Language Understanding
    Zhang, Taolin
    Wang, Chengyu
    Hu, Nan
    Qiu, Minghui
    Tang, Chengguang
    He, Xiaofeng
    Huang, Jun
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 11703 - 11711
  • [43] Preserving Commonsense Knowledge from Pre-trained Language Models via Causal Inference
    Zheng, Junhao
    Ma, Qianli
    Qiu, Shengjie
    Wu, Yue
    Ma, Peitian
    Liu, Junlong
    Feng, Huawen
    Shang, Xichen
    Chen, Haibin
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 9155 - 9173
  • [44] Surgicberta: a pre-trained language model for procedural surgical language
    Bombieri, Marco
    Rospocher, Marco
    Ponzetto, Simone Paolo
    Fiorini, Paolo
    INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS, 2024, 18 (01) : 69 - 81
  • [45] Protocol for the automatic extraction of epidemiological information via a pre-trained language model
    Wang, Zhizheng
    Liu, Xiao Fan
    Du, Zhanwei
    Wang, Lin
    Wu, Ye
    Holme, Petter
    Lachmann, Michael
    Lin, Hongfei
    Wang, Zhuoyue
    Cao, Yu
    Wong, Zoie S. Y.
    Xu, Xiao-Ke
    Sun, Yuanyuan
    STAR PROTOCOLS, 2023, 4 (03):
  • [46] Enhancing Chinese Pre-trained Language Model via Heterogeneous Linguistics Graph
    Li, Yanzeng
    Cao, Jiangxia
    Cong, Xin
    Zhang, Zhenyu
    Yu, Bowen
    Zhu, Hongsong
    Liu, Tingwen
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 1986 - 1996
  • [47] Pre-trained Language Models in Biomedical Domain: A Systematic Survey
    Wang, Benyou
    Xie, Qianqian
    Pei, Jiahuan
    Chen, Zhihong
    Tiwari, Prayag
    Li, Zhao
    Fu, Jie
    ACM COMPUTING SURVEYS, 2024, 56 (03)
  • [48] Probing Simile Knowledge from Pre-trained Language Models
    Chen, Weijie
    Chang, Yongzhu
    Zhang, Rongsheng
    Pu, Jiashu
    Chen, Guandan
    Zhang, Le
    Xi, Yadong
    Chen, Yijiang
    Su, Chang
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 5875 - 5887
  • [49] K-AID: Enhancing Pre-trained Language Models with Domain Knowledge for Question Answering
    Sun, Fu
    Li, Feng-Lin
    Wang, Ruize
    Chen, Qianglong
    Cheng, Xingyi
    Zhang, Ji
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, : 4125 - 4134
  • [50] ProSide: Knowledge Projector and Sideway for Pre-trained Language Models
    He, Chaofan
    Lu, Gewei
    Shen, Liping
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, PT II, NLPCC 2024, 2025, 15360 : 56 - 68