IMF: Integrating Matched Features Using Attentive Logit in Knowledge Distillation

被引:0
|
作者
Kim, Jeongho [1 ]
Lee, Hanbeen [2 ]
Woo, Simon S. [3 ]
机构
[1] Korea Adv Inst Sci & Technol, Korea Adv Inst Sci & Technol, Daejeon, South Korea
[2] NAVER Z Corp, Seongnam, South Korea
[3] Sungkyunkwan Univ, Dept Artificial Intelligence, Seoul, South Korea
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Knowledge distillation (KD) is an effective method for transferring the knowledge of a teacher model to a student model, that aims to improve the latter's performance efficiently. Although generic knowledge distillation methods such as softmax representation distillation and intermediate feature matching have demonstrated improvements with various tasks, only marginal improvements are shown in student networks due to their limited model capacity. In this work, to address the student model's limitation, we propose a novel flexible KD framework, Integrating Matched Features using Attentive Logit in Knowledge Distillation (IMF). Our approach introduces an intermediate feature distiller (IFD) to improve the overall performance of the student model by directly distilling the teacher's knowledge into branches of student models. The generated output of IFD, which is trained by the teacher model, is effectively combined by attentive logit. We use only a few blocks of the student and the trained IFD during inference, requiring an equal or less number of parameters. Through extensive experiments, we demonstrate that IMF consistently outperforms other state-of-the-art methods with a large margin over the various datasets in different tasks without extra computation.
引用
收藏
页码:974 / +
页数:10
相关论文
共 50 条
  • [41] Heterogeneous Knowledge Distillation using Information Flow Modeling
    Passalis, N.
    Tzelepi, M.
    Tefas, A.
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 2336 - 2345
  • [42] Improving the accuracy of pruned network using knowledge distillation
    Prakosa, Setya Widyawan
    Leu, Jenq-Shiou
    Chen, Zhao-Hong
    PATTERN ANALYSIS AND APPLICATIONS, 2021, 24 (02) : 819 - 830
  • [43] BAG-OF-FEATURES-BASED KNOWLEDGE DISTILLATION FOR LIGHTWEIGHT CONVOLUTIONAL NEURAL NETWORKS
    Chariton, Alexandros
    Passalis, Nikolaos
    Tefas, Anastasios
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 1541 - 1545
  • [44] Conditional Response Augmentation for Dialogue using Knowledge Distillation
    Jeong, Myeongho
    Choi, Seungtaek
    Han, Hojae
    Kim, Kyungho
    Hwang, Seung-won
    INTERSPEECH 2020, 2020, : 3890 - 3894
  • [45] Improving the accuracy of pruned network using knowledge distillation
    Setya Widyawan Prakosa
    Jenq-Shiou Leu
    Zhao-Hong Chen
    Pattern Analysis and Applications, 2021, 24 : 819 - 830
  • [46] Improving Neural Topic Models using Knowledge Distillation
    Hoyle, Alexander
    Goel, Pranav
    Resnik, Philip
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 1752 - 1771
  • [47] Improving Knowledge Base Updates with CAIA: A Method Utilizing Capsule Network and Attentive Intratriplet Association Features
    Qiu, Jingxiong
    Sun, Linfu
    Han, Min
    JOURNAL OF SENSORS, 2023, 2023
  • [48] Correction to: Embedded mutual learning: a novel online distillation method integrating diverse knowledge sources
    Chuanxiu Li
    Guangli Li
    Hongbin Zhang
    Donghong Ji
    Applied Intelligence, 2023, 53 : 17240 - 17240
  • [49] Transpose and Mask: Simple and Effective Logit-Based Knowledge Distillation for Multi-attribute and Multi-label Classification
    Zhao, Yuwei
    Li, Annan
    Peng, Guozhen
    Wang, Yunhong
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT X, 2024, 14434 : 273 - 284
  • [50] Logit Variated Product Quantization Based on Parts Interaction and Metric Learning With Knowledge Distillation for Fine-Grained Image Retrieval
    Ma, Lei
    Luo, Xin
    Hong, Hanyu
    Meng, Fanman
    Wu, Qingbo
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 10406 - 10419