Explanation Guided Knowledge Distillation for Pre-trained Language Model Compression

被引:1
|
作者
Yang, Zhao [1 ,2 ]
Zhang, Yuanzhe [1 ,2 ]
Sui, Dianbo [3 ]
Ju, Yiming [1 ,2 ]
Zhao, Jun [1 ,2 ]
Liu, Kang [1 ,2 ]
机构
[1] Univ Chinese Acad Sci, Sch Artificial Intelligence, 95 Zhongguancun East Rd, Beijing 100190, Peoples R China
[2] Chinese Acad Sci, Inst Automat, Lab Cognit & Decis Intelligence Complex Syst, 95 Zhongguancun East Rd, Beijing 100190, Peoples R China
[3] Harbin Inst Technol Weihai, 95 Zhongguancun East Rd, Beijing 100190, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Explanation; knowledge distillation; model compression;
D O I
10.1145/3639364
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Knowledge distillation is widely used in pre-trained language model compression, which can transfer knowledge from a cumbersome model to a lightweight one. Though knowledge distillation based model compression has achieved promising performance, we observe that explanations between the teacher model and the studentmodel are not consistent. We argue that the student model should study not only the predictions of the teacher model but also the internal reasoning process. To this end, we propose Explanation Guided Knowledge Distillation (EGKD) in this article, which utilizes explanations to represent the thinking process and improve knowledge distillation. To obtain explanations in our distillation framework, we select three typical explanation methods rooted in different mechanisms, namely gradient-based, perturbation-based, and feature selection methods. Then, to improve computational efficiency, we propose different optimization strategies to utilize the explanations obtained by these three different explanation methods, which could provide the student model with better learning guidance. Experimental results on GLUE demonstrate that leveraging explanations can improve the performance of the student model. Moreover, our EGKD could also be applied to model compression with different architectures.
引用
收藏
页数:19
相关论文
共 50 条
  • [1] Dynamic Knowledge Distillation for Pre-trained Language Models
    Li, Lei
    Lin, Yankai
    Ren, Shuhuai
    Li, Peng
    Zhou, Jie
    Sun, Xu
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 379 - 389
  • [2] KroneckerBERT: Significant Compression of Pre-trained Language Models Through Kronecker Decomposition and Knowledge Distillation
    Tahaei, Marzieh S.
    Charlaix, Ella
    Nia, Vahid Partovi
    Ghodsi, Ali
    Rezagholizadeh, Mehdi
    NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 2116 - 2127
  • [3] AdaDS: Adaptive data selection for accelerating pre-trained language model knowledge distillation
    Zhou, Qinhong
    Li, Peng
    Liu, Yang
    Guan, Yuyang
    Xing, Qizhou
    Chen, Ming
    Sun, Maosong
    Liu, Yang
    AI OPEN, 2023, 4 : 56 - 63
  • [4] Knowledge Base Grounded Pre-trained Language Models via Distillation
    Sourty, Raphael
    Moreno, Jose G.
    Servant, Francois-Paul
    Tamine, Lynda
    39TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2024, 2024, : 1617 - 1625
  • [5] Domain Knowledge Transferring for Pre-trained Language Model via Calibrated Activation Boundary Distillation
    Choi, Dongha
    Choi, HongSeok
    Lee, Hyunju
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 1658 - 1669
  • [6] ReAugKD: Retrieval-Augmented Knowledge Distillation For Pre-trained Language Models
    Zhang, Jianyi
    Muhamed, Aashiq
    Anantharaman, Aditya
    Wang, Guoyin
    Chen, Changyou
    Zhong, Kai
    Cui, Qingjun
    Xu, Yi
    Zeng, Belinda
    Chilimbi, Trishul
    Chen, Yiran
    61ST CONFERENCE OF THE THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 2, 2023, : 1128 - 1136
  • [7] Knowledge Enhanced Pre-trained Language Model for Product Summarization
    Yin, Wenbo
    Ren, Junxiang
    Wu, Yuejiao
    Song, Ruilin
    Liu, Lang
    Cheng, Zhen
    Wang, Sibo
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2022, PT II, 2022, 13552 : 263 - 273
  • [8] PANLP at MEDIQA 2019: Pre-trained Language Models, Transfer Learning and Knowledge Distillation
    Zhu, Wei
    Zhou, Xiaofeng
    Wang, Keqiang
    Luo, Xun
    Li, Xiepeng
    Ni, Yuan
    Xie, Guotong
    SIGBIOMED WORKSHOP ON BIOMEDICAL NATURAL LANGUAGE PROCESSING (BIONLP 2019), 2019, : 380 - 388
  • [9] Syntax-guided Contrastive Learning for Pre-trained Language Model
    Zhang, Shuai
    Wang, Lijie
    Xiao, Xinyan
    Wu, Hua
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 2430 - 2440
  • [10] Knowledge Rumination for Pre-trained Language Models
    Yao, Yunzhi
    Wang, Peng
    Mao, Shengyu
    Tan, Chuanqi
    Huang, Fei
    Chen, Huajun
    Zhang, Ningyu
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 3387 - 3404