Multilevel Attention-Based Sample Correlations for Knowledge Distillation

被引:58
|
作者
Gou, Jianping [1 ,2 ]
Sun, Liyuan [2 ]
Yu, Baosheng [3 ]
Wan, Shaohua [4 ]
Ou, Weihua [5 ]
Yi, Zhang [6 ]
机构
[1] Southwest Univ, Coll Comp & Informat Sci, Coll Software, Chongqing 400715, Peoples R China
[2] Jiangsu Univ, Sch Comp Sci & Commun Engn, Zhenjiang 212013, Jiangsu, Peoples R China
[3] Univ Sydney, Fac Engn, Sch Comp Sci, Sydney, NSW 2008, Australia
[4] Univ Elect Sci & Technol China, Shenzhen Inst Adv Study, Shenzhen 518110, Peoples R China
[5] Guizhou Normal Univ, Sch Big Data & Comp Sci, Guiyang 550025, Guizhou, Peoples R China
[6] Sichuan Univ, Sch Comp Sci, Chengdu 610065, Peoples R China
基金
中国国家自然科学基金;
关键词
Knowledge engineering; Correlation; Computational modeling; Training; Neural networks; Deep learning; Informatics; Knowledge distillation (KD); model compression; relational knowledge; visual recognition;
D O I
10.1109/TII.2022.3209672
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recently, model compression has been widely used for the deployment of cumbersome deep models on resource-limited edge devices in the performance-demanding industrial Internet of Things (IoT) scenarios. As a simple yet effective model compression technique, knowledge distillation (KD) aims to transfer the knowledge (e.g., sample relationships as the relational knowledge) from a large teacher model to a small student model. However, existing relational KD methods usually build sample correlations directly from the feature maps at a certain middle layer in deep neural networks, which tends to overfit the feature maps of the teacher model and fails to address the most important sample regions. Inspired by this, we argue that the characteristics of important regions are of great importance, and thus, introduce attention maps to construct sample correlations for knowledge distillation. Specifically, with attention maps from multiple middle layers, attention-based sample correlations are newly built upon the most informative sample regions, and can be used as an effective and novel relational knowledge for knowledge distillation. We refer to the proposed method as multilevel attention-based sample correlations for knowledge distillation (or MASCKD). We perform extensive experiments on popular KD datasets for image classification, image retrieval, and person reidentification, where the experimental results demonstrate the effectiveness of the proposed method for relational KD.
引用
收藏
页码:7099 / 7109
页数:11
相关论文
共 50 条
  • [1] Attention-based Feature Interaction for Efficient Online Knowledge Distillation
    Su, Tongtong
    Liang, Qiyu
    Zhang, Jinsong
    Yu, Zhaoyang
    Wang, Gang
    Liu, Xiaoguang
    2021 21ST IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2021), 2021, : 579 - 588
  • [2] Effective Online Knowledge Distillation via Attention-Based Model Ensembling
    Borza, Diana-Laura
    Darabant, Adrian Sergiu
    Ileni, Tudor Alexandru
    Marinescu, Alexandru-Ion
    MATHEMATICS, 2022, 10 (22)
  • [3] ALP-KD: Attention-Based Layer Projection for Knowledge Distillation
    Passban, Peyman
    Wu, Yimeng
    Rezagholizadeh, Mehdi
    Liu, Qun
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 13657 - 13665
  • [4] Alignment Knowledge Distillation for Online Streaming Attention-Based Speech Recognition
    Inaguma, Hirofumi
    Kawahara, Tatsuya
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 1371 - 1385
  • [5] Show, Attend and Distill: Knowledge Distillation via Attention-based Feature Matching
    Ji, Mingi
    Heo, Byeongho
    Park, Sungrae
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 7945 - 7952
  • [6] Attention-Based Knowledge Distillation in Scene Recognition: The Impact of a DCT-Driven Loss
    Lopez-Cifuentes, Alejandro
    Escudero-Vinolo, Marcos
    Bescos, Jesus
    San Miguel, Juan C.
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (09) : 4769 - 4783
  • [7] Attention-Based Hypergraph Knowledge Tracing
    Chang, Xilong
    Guo, Xiaojin
    Liang, Kun
    Zhang, Xiankun
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT XIII, ICIC 2024, 2024, 14874 : 480 - 491
  • [8] Universal-KD: Attention-based Output-Grounded Intermediate Layer Knowledge Distillation
    Wu, Yimeng
    Rezagholizadeh, Mehdi
    Ghaddar, Abbas
    Haidar, Md Akmal
    Ghodsi, Ali
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 7649 - 7661
  • [9] Enhancing Recommendation Capabilities Using Multi-Head Attention-Based Federated Knowledge Distillation
    Wu, Aming
    Kwon, Young-Woo
    IEEE ACCESS, 2023, 11 : 45850 - 45861
  • [10] Weak-supervised Visual Geo-localization via Attention-based Knowledge Distillation
    Xu, Yifan
    Shamsolmoali, Pourya
    Yang, Jie
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 1815 - 1821