Towards Efficient Pre-Trained Language Model via Feature Correlation Distillation

被引:0
|
作者
Huang, Kun [1 ]
Guo, Xin [1 ]
Wang, Meng [1 ]
机构
[1] Ant Grp, Hangzhou, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Knowledge Distillation (KD) has emerged as a promising approach for compressing large Pre-trained Language Models (PLMs). The performance of KD relies on how to effectively formulate and transfer the knowledge from the teacher model to the student model. Prior arts mainly focus on directly aligning output features from the transformer block, which may impose overly strict constraints on the student model's learning process and complicate the training process by introducing extra parameters and computational cost. Moreover, our analysis indicates that the different relations within self-attention, as adopted in other works, involves more computation complexities and can easily be constrained by the number of heads, potentially leading to suboptimal solutions. To address these issues, we propose a novel approach that builds relationships directly from output features. Specifically, we introduce token-level and sequence-level relations concurrently to fully exploit the knowledge from the teacher model. Furthermore, we propose a correlation-based distillation loss to alleviate the exact match properties inherent in traditional KL divergence or MSE loss functions. Our method, dubbed FCD, presents a simple yet effective method to compress various architectures (BERT, RoBERTa, and GPT) and model sizes (base-size and large-size). Extensive experimental results demonstrate that our distilled, smaller language models significantly surpass existing KD methods across various NLP tasks.
引用
收藏
页数:15
相关论文
共 50 条
  • [31] CLIP-Llama: A New Approach for Scene Text Recognition with a Pre-Trained Vision-Language Model and a Pre-Trained Language Model
    Zhao, Xiaoqing
    Xu, Miaomiao
    Silamu, Wushour
    Li, Yanbing
    SENSORS, 2024, 24 (22)
  • [32] Efficient word segmentation for enhancing Chinese spelling check in pre-trained language model
    Li, Fangfang
    Jiang, Jie
    Tang, Dafu
    Shan, Youran
    Duan, Junwen
    Zhang, Shichao
    KNOWLEDGE AND INFORMATION SYSTEMS, 2025, 67 (01) : 603 - 632
  • [33] Compression of Generative Pre-trained Language Models via Quantization
    Tao, Chaofan
    Hou, Lu
    Zhang, Wei
    Shang, Lifeng
    Jiang, Xin
    Liu, Qun
    Luo, Ping
    Wong, Ngai
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 4821 - 4836
  • [34] Parallel Corpus Filtering via Pre-trained Language Models
    DiDi Labs
    arXiv, 2020,
  • [35] Multilingual Translation via Grafting Pre-trained Language Models
    Sun, Zewei
    Wang, Mingxuan
    Li, Lei
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 2735 - 2747
  • [36] Knowledge graph extension with a pre-trained language model via unified learning method
    Choi, Bonggeun
    Ko, Youngjoong
    KNOWLEDGE-BASED SYSTEMS, 2023, 262
  • [37] APIRecX: Cross-Library API Recommendation via Pre-Trained Language Model
    Kang, Yuning
    Wang, Zan
    Zhang, Hongyu
    Chen, Junjie
    You, Hanmo
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 3425 - 3436
  • [38] Enhancing Language Generation with Effective Checkpoints of Pre-trained Language Model
    Park, Jeonghyeok
    Zhao, Hai
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 2686 - 2694
  • [39] Tuning Pre-trained Model via Moment Probing
    Gao, Mingze
    Wang, Qilong
    Lin, Zhenyi
    Zhu, Pengfei
    Hu, Qinghua
    Zhou, Jingbo
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 11769 - 11779
  • [40] Towards unifying pre-trained language models for semantic text exchange
    Miao, Jingyuan
    Zhang, Yuqi
    Jiang, Nan
    Wen, Jie
    Pei, Kanglu
    Wan, Yue
    Wan, Tao
    Chen, Honglong
    WIRELESS NETWORKS, 2024, 30 (07) : 6385 - 6398