Explanation Guided Knowledge Distillation for Pre-trained Language Model Compression

被引:1
|
作者
Yang, Zhao [1 ,2 ]
Zhang, Yuanzhe [1 ,2 ]
Sui, Dianbo [3 ]
Ju, Yiming [1 ,2 ]
Zhao, Jun [1 ,2 ]
Liu, Kang [1 ,2 ]
机构
[1] Univ Chinese Acad Sci, Sch Artificial Intelligence, 95 Zhongguancun East Rd, Beijing 100190, Peoples R China
[2] Chinese Acad Sci, Inst Automat, Lab Cognit & Decis Intelligence Complex Syst, 95 Zhongguancun East Rd, Beijing 100190, Peoples R China
[3] Harbin Inst Technol Weihai, 95 Zhongguancun East Rd, Beijing 100190, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Explanation; knowledge distillation; model compression;
D O I
10.1145/3639364
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Knowledge distillation is widely used in pre-trained language model compression, which can transfer knowledge from a cumbersome model to a lightweight one. Though knowledge distillation based model compression has achieved promising performance, we observe that explanations between the teacher model and the studentmodel are not consistent. We argue that the student model should study not only the predictions of the teacher model but also the internal reasoning process. To this end, we propose Explanation Guided Knowledge Distillation (EGKD) in this article, which utilizes explanations to represent the thinking process and improve knowledge distillation. To obtain explanations in our distillation framework, we select three typical explanation methods rooted in different mechanisms, namely gradient-based, perturbation-based, and feature selection methods. Then, to improve computational efficiency, we propose different optimization strategies to utilize the explanations obtained by these three different explanation methods, which could provide the student model with better learning guidance. Experimental results on GLUE demonstrate that leveraging explanations can improve the performance of the student model. Moreover, our EGKD could also be applied to model compression with different architectures.
引用
收藏
页数:19
相关论文
共 50 条
  • [31] Surgicberta: a pre-trained language model for procedural surgical language
    Bombieri, Marco
    Rospocher, Marco
    Ponzetto, Simone Paolo
    Fiorini, Paolo
    INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS, 2024, 18 (01) : 69 - 81
  • [32] Probing Simile Knowledge from Pre-trained Language Models
    Chen, Weijie
    Chang, Yongzhu
    Zhang, Rongsheng
    Pu, Jiashu
    Chen, Guandan
    Zhang, Le
    Xi, Yadong
    Chen, Yijiang
    Su, Chang
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 5875 - 5887
  • [33] ProSide: Knowledge Projector and Sideway for Pre-trained Language Models
    He, Chaofan
    Lu, Gewei
    Shen, Liping
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, PT II, NLPCC 2024, 2025, 15360 : 56 - 68
  • [34] Continual knowledge infusion into pre-trained biomedical language models
    Jha, Kishlay
    Zhang, Aidong
    BIOINFORMATICS, 2022, 38 (02) : 494 - 502
  • [35] BERTweet: A pre-trained language model for English Tweets
    Dat Quoc Nguyen
    Thanh Vu
    Anh Tuan Nguyen
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING: SYSTEM DEMONSTRATIONS, 2020, : 9 - 14
  • [36] Pre-trained Language Model for Biomedical Question Answering
    Yoon, Wonjin
    Lee, Jinhyuk
    Kim, Donghyeon
    Jeong, Minbyul
    Kang, Jaewoo
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2019, PT II, 2020, 1168 : 727 - 740
  • [37] ViDeBERTa: A powerful pre-trained language model for Vietnamese
    Tran, Cong Dao
    Pham, Nhut Huy
    Nguyen, Anh
    Hy, Truong Son
    Vu, Tu
    17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 1071 - 1078
  • [38] Knowledge graph extension with a pre-trained language model via unified learning method
    Choi, Bonggeun
    Ko, Youngjoong
    KNOWLEDGE-BASED SYSTEMS, 2023, 262
  • [39] Misspelling Correction with Pre-trained Contextual Language Model
    Hu, Yifei
    Ting, Xiaonan
    Ko, Youlim
    Rayz, Julia Taylor
    PROCEEDINGS OF 2020 IEEE 19TH INTERNATIONAL CONFERENCE ON COGNITIVE INFORMATICS & COGNITIVE COMPUTING (ICCI*CC 2020), 2020, : 144 - 149
  • [40] A Pre-trained Knowledge Tracing Model with Limited Data
    Yue, Wenli
    Su, Wei
    Liu, Lei
    Cai, Chuan
    Yuan, Yongna
    Jia, Zhongfeng
    Liu, Jiamin
    Xie, Wenjian
    DATABASE AND EXPERT SYSTEMS APPLICATIONS, PT I, DEXA 2024, 2024, 14910 : 163 - 178