Dual-Level Knowledge Distillation via Knowledge Alignment and Correlation

被引:6
|
作者
Ding, Fei [1 ]
Yang, Yin [1 ]
Hu, Hongxin [2 ]
Krovi, Venkat [3 ,4 ]
Luo, Feng [1 ]
机构
[1] Clemson Univ, Sch Comp, Clemson, SC 29634 USA
[2] Buffalo State Univ New York, Dept Comp Sci & Engn, Buffalo, NY 14260 USA
[3] Clemson Univ, Dept Automot Engn, Clemson, SC 29634 USA
[4] Clemson Univ, Dept Mech Engn, Clemson, SC 29634 USA
基金
美国国家科学基金会;
关键词
Correlation; Knowledge engineering; Task analysis; Standards; Network architecture; Prototypes; Training; Convolutional neural networks; dual-level knowledge; knowledge distillation (KD); representation learning; teacher-student model;
D O I
10.1109/TNNLS.2022.3190166
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Knowledge distillation (KD) has become a widely used technique for model compression and knowledge transfer. We find that the standard KD method performs the knowledge alignment on an individual sample indirectly via class prototypes and neglects the structural knowledge between different samples, namely, knowledge correlation. Although recent contrastive learning-based distillation methods can be decomposed into knowledge alignment and correlation, their correlation objectives undesirably push apart representations of samples from the same class, leading to inferior distillation results. To improve the distillation performance, in this work, we propose a novel knowledge correlation objective and introduce the dual-level knowledge distillation (DLKD), which explicitly combines knowledge alignment and correlation together instead of using one single contrastive objective. We show that both knowledge alignment and correlation are necessary to improve the distillation performance. In particular, knowledge correlation can serve as an effective regularization to learn generalized representations. The proposed DLKD is task-agnostic and model-agnostic, and enables effective knowledge transfer from supervised or self-supervised pretrained teachers to students. Experiments show that DLKD outperforms other state-of-the-art methods on a large number of experimental settings including: 1) pretraining strategies; 2) network architectures; 3) datasets; and 4) tasks.
引用
收藏
页码:2425 / 2435
页数:11
相关论文
共 50 条
  • [41] Dual knowledge distillation for bidirectional neural machine translation
    Zhang, Huaao
    Qiu, Shigui
    Wu, Shilong
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [42] Dual model knowledge distillation for industrial anomaly detection
    Thomine, Simon
    Snoussi, Hichem
    PATTERN ANALYSIS AND APPLICATIONS, 2024, 27 (03)
  • [43] Frame Correlation Knowledge Distillation for Gait Recognition in the Wild
    Peng, Guozhen
    Zhang, Shaoxiong
    Zhao, Yuwei
    Li, Annan
    Wang, Yunhong
    BIOMETRIC RECOGNITION, CCBR 2023, 2023, 14463 : 280 - 290
  • [44] DUAL KNOWLEDGE DISTILLATION FOR EFFICIENT SOUND EVENT DETECTION
    Xiao, Yang
    Das, Rohan Kumar
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW 2024, 2024, : 690 - 694
  • [45] Channel-Correlation-Based Selective Knowledge Distillation
    Gou, Jianping
    Xiong, Xiangshuo
    Yu, Baosheng
    Zhan, Yibing
    Yi, Zhang
    IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2023, 15 (03) : 1574 - 1585
  • [46] FedAlign: Federated Model Alignment via Data-Free Knowledge Distillation for Machine Fault Diagnosis
    Sun, Wenjun
    Yan, Ruqiang
    Jin, Ruibing
    Zhao, Rui
    Chen, Zhenghua
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2024, 73 : 1 - 12
  • [47] Feature-Level Ensemble Knowledge Distillation for Aggregating Knowledge from Multiple Networks
    Park, SeongUk
    Kwak, Nojun
    ECAI 2020: 24TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, 325 : 1411 - 1418
  • [48] Art style classification via self-supervised dual-teacher knowledge distillation
    Luo, Mei
    Liu, Li
    Lu, Yue
    Suen, Ching Y.
    APPLIED SOFT COMPUTING, 2025, 174
  • [49] Dual-Branch Knowledge Distillation via Residual Features Aggregation Module for Anomaly Segmentation
    Zhou, You
    Huang, Zihao
    Zeng, Deyu
    Qu, Yanyun
    Wu, Zongze
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2025, 74
  • [50] Explaining Knowledge Distillation by Quantifying the Knowledge
    Cheng, Xu
    Rao, Zhefan
    Chen, Yilan
    Zhang, Quanshi
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, : 12922 - 12932