Dual-Level Knowledge Distillation via Knowledge Alignment and Correlation

被引:6
|
作者
Ding, Fei [1 ]
Yang, Yin [1 ]
Hu, Hongxin [2 ]
Krovi, Venkat [3 ,4 ]
Luo, Feng [1 ]
机构
[1] Clemson Univ, Sch Comp, Clemson, SC 29634 USA
[2] Buffalo State Univ New York, Dept Comp Sci & Engn, Buffalo, NY 14260 USA
[3] Clemson Univ, Dept Automot Engn, Clemson, SC 29634 USA
[4] Clemson Univ, Dept Mech Engn, Clemson, SC 29634 USA
基金
美国国家科学基金会;
关键词
Correlation; Knowledge engineering; Task analysis; Standards; Network architecture; Prototypes; Training; Convolutional neural networks; dual-level knowledge; knowledge distillation (KD); representation learning; teacher-student model;
D O I
10.1109/TNNLS.2022.3190166
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Knowledge distillation (KD) has become a widely used technique for model compression and knowledge transfer. We find that the standard KD method performs the knowledge alignment on an individual sample indirectly via class prototypes and neglects the structural knowledge between different samples, namely, knowledge correlation. Although recent contrastive learning-based distillation methods can be decomposed into knowledge alignment and correlation, their correlation objectives undesirably push apart representations of samples from the same class, leading to inferior distillation results. To improve the distillation performance, in this work, we propose a novel knowledge correlation objective and introduce the dual-level knowledge distillation (DLKD), which explicitly combines knowledge alignment and correlation together instead of using one single contrastive objective. We show that both knowledge alignment and correlation are necessary to improve the distillation performance. In particular, knowledge correlation can serve as an effective regularization to learn generalized representations. The proposed DLKD is task-agnostic and model-agnostic, and enables effective knowledge transfer from supervised or self-supervised pretrained teachers to students. Experiments show that DLKD outperforms other state-of-the-art methods on a large number of experimental settings including: 1) pretraining strategies; 2) network architectures; 3) datasets; and 4) tasks.
引用
收藏
页码:2425 / 2435
页数:11
相关论文
共 50 条
  • [21] Difficulty level-based knowledge distillation
    Ham, Gyeongdo
    Cho, Yucheol
    Lee, Jae-Hyeok
    Kang, Minchan
    Choi, Gyuwon
    Kim, Daeshik
    NEUROCOMPUTING, 2024, 606
  • [22] DANE: A Dual-Level Alignment Network With Ensemble Learning for Multisource Domain Adaptation
    Yang, Yuxiang
    Wen, Lu
    Zeng, Pinxian
    Yan, Binyu
    Wang, Yan
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2024, 73 : 1 - 11
  • [23] MFKD: Multi-dimensional feature alignment for knowledge distillation
    Guo, Zhen
    Zhang, Pengzhou
    Liang, Peng
    IMAGE AND VISION COMPUTING, 2025, 157
  • [24] Pairwise dual-level alignment for cross-prompt automated essay scoring
    Zhang, Chunyun
    Deng, Jiqin
    Dong, Xiaolin
    Zhao, Hongyan
    Liu, Kailin
    Cui, Chaoran
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 265
  • [25] Knowledge Distillation via Route Constrained Optimization
    Jin, Xiao
    Peng, Baoyun
    Wu, Yichao
    Liu, Yu
    Liu, Jiaheng
    Liang, Ding
    Yan, Junjie
    Hu, Xiaolin
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 1345 - 1354
  • [26] Knowledge distillation via Noisy Feature Reconstruction
    Shi, Chaokun
    Hao, Yuexing
    Li, Gongyan
    Xu, Shaoyun
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 257
  • [27] Collaborative Knowledge Distillation via Multiknowledge Transfer
    Gou, Jianping
    Sun, Liyuan
    Yu, Baosheng
    Du, Lan
    Ramamohanarao, Kotagiri
    Tao, Dacheng
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (05) : 6718 - 6730
  • [28] Improved Knowledge Distillation via Teacher Assistant
    Mirzadeh, Seyed Iman
    Farajtabar, Mehrdad
    Li, Ang
    Levine, Nir
    Matsukawa, Akihiro
    Ghasemzadeh, Hassan
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 5191 - 5198
  • [29] A Virtual Knowledge Distillation via Conditional GAN
    Kim, Sihwan
    IEEE ACCESS, 2022, 10 : 34766 - 34778
  • [30] Improving knowledge distillation via an expressive teacher
    Tan, Chao
    Liu, Jie
    Zhang, Xiang
    KNOWLEDGE-BASED SYSTEMS, 2021, 218