Dual-Level Knowledge Distillation via Knowledge Alignment and Correlation

被引:6
|
作者
Ding, Fei [1 ]
Yang, Yin [1 ]
Hu, Hongxin [2 ]
Krovi, Venkat [3 ,4 ]
Luo, Feng [1 ]
机构
[1] Clemson Univ, Sch Comp, Clemson, SC 29634 USA
[2] Buffalo State Univ New York, Dept Comp Sci & Engn, Buffalo, NY 14260 USA
[3] Clemson Univ, Dept Automot Engn, Clemson, SC 29634 USA
[4] Clemson Univ, Dept Mech Engn, Clemson, SC 29634 USA
基金
美国国家科学基金会;
关键词
Correlation; Knowledge engineering; Task analysis; Standards; Network architecture; Prototypes; Training; Convolutional neural networks; dual-level knowledge; knowledge distillation (KD); representation learning; teacher-student model;
D O I
10.1109/TNNLS.2022.3190166
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Knowledge distillation (KD) has become a widely used technique for model compression and knowledge transfer. We find that the standard KD method performs the knowledge alignment on an individual sample indirectly via class prototypes and neglects the structural knowledge between different samples, namely, knowledge correlation. Although recent contrastive learning-based distillation methods can be decomposed into knowledge alignment and correlation, their correlation objectives undesirably push apart representations of samples from the same class, leading to inferior distillation results. To improve the distillation performance, in this work, we propose a novel knowledge correlation objective and introduce the dual-level knowledge distillation (DLKD), which explicitly combines knowledge alignment and correlation together instead of using one single contrastive objective. We show that both knowledge alignment and correlation are necessary to improve the distillation performance. In particular, knowledge correlation can serve as an effective regularization to learn generalized representations. The proposed DLKD is task-agnostic and model-agnostic, and enables effective knowledge transfer from supervised or self-supervised pretrained teachers to students. Experiments show that DLKD outperforms other state-of-the-art methods on a large number of experimental settings including: 1) pretraining strategies; 2) network architectures; 3) datasets; and 4) tasks.
引用
收藏
页码:2425 / 2435
页数:11
相关论文
共 50 条
  • [31] A Virtual Knowledge Distillation via Conditional GAN
    Kim, Sihwan
    IEEE Access, 2022, 10 : 34766 - 34778
  • [32] Private Model Compression via Knowledge Distillation
    Wang, Ji
    Bao, Weidong
    Sun, Lichao
    Zhu, Xiaomin
    Cao, Bokai
    Yu, Philip S.
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 1190 - +
  • [33] Knowledge Distillation via Instance Relationship Graph
    Liu, Yufan
    Cao, Jiajiong
    Li, Bing
    Yuan, Chunfeng
    Hu, Weiming
    Li, Yangxi
    Duan, Yunqiang
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 7099 - 7107
  • [34] Ensembled CTR Prediction via Knowledge Distillation
    Zhu, Jieming
    Liu, Jinyang
    Li, Weiqi
    Lai, Jincai
    He, Xiuqiang
    Chen, Liang
    Zheng, Zibin
    CIKM '20: PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, 2020, : 2941 - 2948
  • [35] Knowledge Distillation via Constrained Variational Inference
    Saeedi, Ardavan
    Utsumi, Yuria
    Sun, Li
    Batmanghelich, Kayhan
    Lehman, Li-wei H.
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 8132 - 8140
  • [36] Self-knowledge distillation via dropout
    Lee, Hyoje
    Park, Yeachan
    Seo, Hyun
    Kang, Myungjoo
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2023, 233
  • [37] Generalized Knowledge Distillation via Relationship Matching
    Ye, Han-Jia
    Lu, Su
    Zhan, De-Chuan
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (02) : 1817 - 1834
  • [38] Multi-level knowledge distillation via dynamic decision boundaries exploration and exploitation
    Tao, Ze
    Li, Haowei
    Zhang, Jian
    Zhang, Shichao
    INFORMATION FUSION, 2024, 112
  • [39] Correlation Guided Multi-teacher Knowledge Distillation
    Shi, Luyao
    Jiang, Ning
    Tang, Jialiang
    Huang, Xinlei
    NEURAL INFORMATION PROCESSING, ICONIP 2023, PT IV, 2024, 14450 : 562 - 574
  • [40] KGEx: Explaining Knowledge Graph Embeddings via Subgraph Sampling and Knowledge Distillation
    Baltatzis, Vasileios
    Costabello, Luca
    LEARNING ON GRAPHS CONFERENCE, VOL 231, 2023, 231