Contrastive Distillation on Intermediate Representations for Language Model Compression

被引:0
|
作者
Sun, Siqi [1 ]
Gan, Zhe [1 ]
Cheng, Yu [1 ]
Fang, Yuwei [1 ]
Wang, Shuohang [1 ]
Liu, Jingjing [1 ]
机构
[1] Microsoft Dynam 365 Res, Redmond, WA 98008 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Existing language model compression methods mostly use a simple L-2 loss to distill knowledge in the intermediate representations of a large BERT model to a smaller one. Although widely used, this objective by design assumes that all the dimensions of hidden representations are independent, failing to capture important structural knowledge in the intermediate layers of the teacher network. To achieve better distillation efficacy, we propose Contrastive Distillation on Intermediate Representations (CODIR), a principled knowledge distillation framework where the student is trained to distill knowledge through intermediate layers of the teacher via a contrastive objective. By learning to distinguish positive sample from a large set of negative samples, CoDIR facilitates the student's exploitation of rich information in teacher's hidden layers. CoDIR can be readily applied to compress large-scale language models in both pre-training and finetuning stages, and achieves superb performance on the GLUE benchmark, outperforming state-of-the-art compression methods.(1)
引用
收藏
页码:498 / 508
页数:11
相关论文
共 50 条
  • [21] Analysis of Model Compression Using Knowledge Distillation
    Hong, Yu-Wei
    Leu, Jenq-Shiou
    Faisal, Muhamad
    Prakosa, Setya Widyawan
    IEEE ACCESS, 2022, 10 : 85095 - 85105
  • [22] Triplet Knowledge Distillation Networks for Model Compression
    Tang, Jialiang
    Jiang, Ning
    Yu, Wenxin
    Wu, Wenqin
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [23] Private Model Compression via Knowledge Distillation
    Wang, Ji
    Bao, Weidong
    Sun, Lichao
    Zhu, Xiaomin
    Cao, Bokai
    Yu, Philip S.
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 1190 - +
  • [24] Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations
    Jin, Peng
    Huang, Jinfa
    Liu, Fenglin
    Wu, Xian
    Ge, Shen
    Song, Guoli
    Clifton, David A.
    Chen, Jie
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [25] A Task-Efficient Gradient Guide Knowledge Distillation for Pre-train Language Model Compression
    Liu, Xu
    Su, Yila
    Wu, Nier
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT III, ICIC 2024, 2024, 14877 : 366 - 377
  • [26] Wasserstein Contrastive Representation Distillation
    Chen, Liqun
    Wang, Dong
    Gan, Zhe
    Liu, Jingjing
    Henao, Ricardo
    Carin, Lawrence
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 16291 - 16300
  • [27] Complementary Relation Contrastive Distillation
    Zhu, Jinguo
    Tang, Shixiang
    Chen, Dapeng
    Yu, Shijie
    Liu, Yakun
    Rong, Mingzhe
    Yang, Aijun
    Wang, Xiaohua
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 9256 - 9265
  • [28] MaskCLIP: Masked Self-Distillation Advances Contrastive Language-Image Pretraining
    Dong, Xiaoyi
    Bao, Jianmin
    Zheng, Yinglin
    Zhang, Ting
    Chen, Dongdong
    Yang, Hao
    Zeng, Ming
    Zhang, Weiming
    Yuan, Lu
    Chen, Dong
    Wen, Fang
    Yu, Nenghai
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 10995 - 11005
  • [29] Slimmed Asymmetrical Contrastive Learning and Cross Distillation for Lightweight Model Training
    Meng, Jian
    Yang, Li
    Lee, Kyungmin
    Shin, Jinwoo
    Fan, Deliang
    Seo, Jae-sun
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [30] Revisiting Intermediate Layer Distillation for Compressing Language Models: An Overfitting Perspective
    Ko, Jongwoo
    Park, Seungjoon
    Jeong, Minchan
    Hong, Sukjin
    Ahn, Euijai
    Chang, Du-Seong
    Yun, Se-Young
    17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 158 - 175