Multilevel Attention-Based Sample Correlations for Knowledge Distillation

被引:58
|
作者
Gou, Jianping [1 ,2 ]
Sun, Liyuan [2 ]
Yu, Baosheng [3 ]
Wan, Shaohua [4 ]
Ou, Weihua [5 ]
Yi, Zhang [6 ]
机构
[1] Southwest Univ, Coll Comp & Informat Sci, Coll Software, Chongqing 400715, Peoples R China
[2] Jiangsu Univ, Sch Comp Sci & Commun Engn, Zhenjiang 212013, Jiangsu, Peoples R China
[3] Univ Sydney, Fac Engn, Sch Comp Sci, Sydney, NSW 2008, Australia
[4] Univ Elect Sci & Technol China, Shenzhen Inst Adv Study, Shenzhen 518110, Peoples R China
[5] Guizhou Normal Univ, Sch Big Data & Comp Sci, Guiyang 550025, Guizhou, Peoples R China
[6] Sichuan Univ, Sch Comp Sci, Chengdu 610065, Peoples R China
基金
中国国家自然科学基金;
关键词
Knowledge engineering; Correlation; Computational modeling; Training; Neural networks; Deep learning; Informatics; Knowledge distillation (KD); model compression; relational knowledge; visual recognition;
D O I
10.1109/TII.2022.3209672
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recently, model compression has been widely used for the deployment of cumbersome deep models on resource-limited edge devices in the performance-demanding industrial Internet of Things (IoT) scenarios. As a simple yet effective model compression technique, knowledge distillation (KD) aims to transfer the knowledge (e.g., sample relationships as the relational knowledge) from a large teacher model to a small student model. However, existing relational KD methods usually build sample correlations directly from the feature maps at a certain middle layer in deep neural networks, which tends to overfit the feature maps of the teacher model and fails to address the most important sample regions. Inspired by this, we argue that the characteristics of important regions are of great importance, and thus, introduce attention maps to construct sample correlations for knowledge distillation. Specifically, with attention maps from multiple middle layers, attention-based sample correlations are newly built upon the most informative sample regions, and can be used as an effective and novel relational knowledge for knowledge distillation. We refer to the proposed method as multilevel attention-based sample correlations for knowledge distillation (or MASCKD). We perform extensive experiments on popular KD datasets for image classification, image retrieval, and person reidentification, where the experimental results demonstrate the effectiveness of the proposed method for relational KD.
引用
收藏
页码:7099 / 7109
页数:11
相关论文
共 50 条
  • [31] Learning Attention-based Embeddings for Relation Prediction in Knowledge Graphs
    Nathani, Deepak
    Chauhan, Jatin
    Sharma, Charu
    Kaul, Manohar
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 4710 - 4723
  • [32] Bilinear Fusion of Commonsense Knowledge with Attention-Based NLI Models
    Gajbhiye, Amit
    Winterbottom, Thomas
    Al Moubayed, Noura
    Bradley, Steven
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2020, PT I, 2020, 12396 : 633 - 646
  • [33] Attention-based Multi-hop Reasoning for Knowledge Graph
    Wang, Zikang
    Li, Linjing
    Zeng, Daniel Dajun
    Chen, Yue
    2018 IEEE INTERNATIONAL CONFERENCE ON INTELLIGENCE AND SECURITY INFORMATICS (ISI), 2018, : 211 - 213
  • [34] Learning from Interpretable Analysis: Attention-Based Knowledge Tracing
    Zhu, Jia
    Yu, Weihao
    Zheng, Zetao
    Huang, Changqin
    Tang, Yong
    Fung, Gabriel Pui Cheong
    ARTIFICIAL INTELLIGENCE IN EDUCATION (AIED 2020), PT II, 2020, 12164 : 364 - 368
  • [35] Attention-based Bidirectional Long Short-Term Memory Networks for Relation Classification Using Knowledge Distillation from BERT
    Wang, Zihan
    Yang, Bo
    2020 IEEE INTL CONF ON DEPENDABLE, AUTONOMIC AND SECURE COMPUTING, INTL CONF ON PERVASIVE INTELLIGENCE AND COMPUTING, INTL CONF ON CLOUD AND BIG DATA COMPUTING, INTL CONF ON CYBER SCIENCE AND TECHNOLOGY CONGRESS (DASC/PICOM/CBDCOM/CYBERSCITECH), 2020, : 562 - 568
  • [36] MATNet: Multilevel attention-based transformers for change detection in remote sensing images
    Zhang, Zhongyu
    Liu, Shujun
    Qin, Yingxiang
    Wang, Huajun
    IMAGE AND VISION COMPUTING, 2024, 151
  • [37] Attention-based similarity
    Stentiford, Fred
    PATTERN RECOGNITION, 2007, 40 (03) : 771 - 783
  • [38] Self-knowledge distillation based on dynamic mixed attention
    Tang, Yuan
    Chen, Ying
    Kongzhi yu Juece/Control and Decision, 2024, 39 (12): : 4099 - 4108
  • [39] Attention-based learning
    Kasderidis, S
    Taylor, JG
    2004 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-4, PROCEEDINGS, 2004, : 525 - 530
  • [40] CityCross: Transferring Attention-based Knowledge for Location-based Advertising Recommendation
    Qiu, Dazhuo
    Wang, Yihao
    Zhao, Yan
    Deng, Liwei
    Zheng, Kai
    2022 23RD IEEE INTERNATIONAL CONFERENCE ON MOBILE DATA MANAGEMENT (MDM 2022), 2022, : 254 - 261