Contrastive Distillation on Intermediate Representations for Language Model Compression

被引:0
|
作者
Sun, Siqi [1 ]
Gan, Zhe [1 ]
Cheng, Yu [1 ]
Fang, Yuwei [1 ]
Wang, Shuohang [1 ]
Liu, Jingjing [1 ]
机构
[1] Microsoft Dynam 365 Res, Redmond, WA 98008 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Existing language model compression methods mostly use a simple L-2 loss to distill knowledge in the intermediate representations of a large BERT model to a smaller one. Although widely used, this objective by design assumes that all the dimensions of hidden representations are independent, failing to capture important structural knowledge in the intermediate layers of the teacher network. To achieve better distillation efficacy, we propose Contrastive Distillation on Intermediate Representations (CODIR), a principled knowledge distillation framework where the student is trained to distill knowledge through intermediate layers of the teacher via a contrastive objective. By learning to distinguish positive sample from a large set of negative samples, CoDIR facilitates the student's exploitation of rich information in teacher's hidden layers. CoDIR can be readily applied to compress large-scale language models in both pre-training and finetuning stages, and achieves superb performance on the GLUE benchmark, outperforming state-of-the-art compression methods.(1)
引用
收藏
页码:498 / 508
页数:11
相关论文
共 50 条
  • [41] Pre-trained Language Model Representations for Language Generation
    Edunov, Sergey
    Baevski, Alexei
    Auli, Michael
    2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 4052 - 4059
  • [42] Learning Unified Video-Language Representations via Joint Modeling and Contrastive Learning for Natural Language Video Localization
    Cui, Chenhao
    Liang, Xinnian
    Wu, Shuangzhi
    Li, Zhoujun
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [43] Multidisciplinary Model Transformation through Simplified Intermediate Representations
    Cole, Bjorn
    Dinkel, Kevin
    2016 IEEE AEROSPACE CONFERENCE, 2016,
  • [44] SMOOTHLM: A LANGUAGE MODEL COMPRESSION LIBRARY
    Akin, Ahmet Afsin
    Demir, Cemil
    2014 22ND SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2014, : 1833 - 1836
  • [45] Contrastive Learning with Adversarial Examples for Alleviating Pathology of Language Model
    Zhan, Pengwei
    Yang, Jing
    Huang, Xiao
    Jing, Chunlei
    Li, Jingying
    Wang, Liming
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 6493 - 6508
  • [46] Customizing Language Model Responses with Contrastive In-Context Learning
    Gao, Xiang
    Das, Kamalika
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 16, 2024, : 18039 - 18046
  • [47] Learning Low-Rank Representations for Model Compression
    Zhu, Zezhou
    Dong, Yuan
    Zhao, Zhong
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [48] Model Compression Algorithm via Reinforcement Learning and Knowledge Distillation
    Liu, Botao
    Hu, Bing-Bing
    Zhao, Ming
    Peng, Sheng-Lung
    Chang, Jou-Ming
    Tsoulos, Ioannis G.
    MATHEMATICS, 2023, 11 (22)
  • [49] PQK: Model Compression via Pruning, Quantization, and Knowledge Distillation
    Kim, Jangho
    Chang, Simyung
    Kwak, Nojun
    INTERSPEECH 2021, 2021, : 4568 - 4572
  • [50] Model Compression Based on Knowledge Distillation and Its Application in HRRP
    Chen, Xiaojiao
    An, Zhenyu
    Huang, Liansheng
    He, Shiying
    Wang, Zhen
    PROCEEDINGS OF 2020 IEEE 4TH INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC 2020), 2020, : 1268 - 1272