Multilevel Attention-Based Sample Correlations for Knowledge Distillation

被引:58
|
作者
Gou, Jianping [1 ,2 ]
Sun, Liyuan [2 ]
Yu, Baosheng [3 ]
Wan, Shaohua [4 ]
Ou, Weihua [5 ]
Yi, Zhang [6 ]
机构
[1] Southwest Univ, Coll Comp & Informat Sci, Coll Software, Chongqing 400715, Peoples R China
[2] Jiangsu Univ, Sch Comp Sci & Commun Engn, Zhenjiang 212013, Jiangsu, Peoples R China
[3] Univ Sydney, Fac Engn, Sch Comp Sci, Sydney, NSW 2008, Australia
[4] Univ Elect Sci & Technol China, Shenzhen Inst Adv Study, Shenzhen 518110, Peoples R China
[5] Guizhou Normal Univ, Sch Big Data & Comp Sci, Guiyang 550025, Guizhou, Peoples R China
[6] Sichuan Univ, Sch Comp Sci, Chengdu 610065, Peoples R China
基金
中国国家自然科学基金;
关键词
Knowledge engineering; Correlation; Computational modeling; Training; Neural networks; Deep learning; Informatics; Knowledge distillation (KD); model compression; relational knowledge; visual recognition;
D O I
10.1109/TII.2022.3209672
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recently, model compression has been widely used for the deployment of cumbersome deep models on resource-limited edge devices in the performance-demanding industrial Internet of Things (IoT) scenarios. As a simple yet effective model compression technique, knowledge distillation (KD) aims to transfer the knowledge (e.g., sample relationships as the relational knowledge) from a large teacher model to a small student model. However, existing relational KD methods usually build sample correlations directly from the feature maps at a certain middle layer in deep neural networks, which tends to overfit the feature maps of the teacher model and fails to address the most important sample regions. Inspired by this, we argue that the characteristics of important regions are of great importance, and thus, introduce attention maps to construct sample correlations for knowledge distillation. Specifically, with attention maps from multiple middle layers, attention-based sample correlations are newly built upon the most informative sample regions, and can be used as an effective and novel relational knowledge for knowledge distillation. We refer to the proposed method as multilevel attention-based sample correlations for knowledge distillation (or MASCKD). We perform extensive experiments on popular KD datasets for image classification, image retrieval, and person reidentification, where the experimental results demonstrate the effectiveness of the proposed method for relational KD.
引用
收藏
页码:7099 / 7109
页数:11
相关论文
共 50 条
  • [41] Author Correction: Attention and feature transfer based knowledge distillation
    Guoliang Yang
    Shuaiying Yu
    Yangyang Sheng
    Hao Yang
    Scientific Reports, 13
  • [42] Knowledge-Grounded Attention-Based Neural Machine Translation Model
    Israr, Huma
    Khan, Safdar Abbas
    Tahir, Muhammad Ali
    Shahzad, Muhammad Khuram
    Ahmad, Muneer
    Zain, Jasni Mohamad
    APPLIED COMPUTATIONAL INTELLIGENCE AND SOFT COMPUTING, 2025, 2025 (01)
  • [43] Attention-based Learning for Multiple Relation Patterns in Knowledge Graph Embedding
    Song, Tengwei
    Luo, Jie
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT I, 2022, 13368 : 658 - 670
  • [44] Talking-heads attention-based knowledge representation for link prediction
    Wang, Shirui
    Zhou, Wen'an
    Zhou, Qiang
    COMPUTER SPEECH AND LANGUAGE, 2022, 74
  • [45] An attention-based representation learning model for multiple relational knowledge graph
    Han, Zhongming
    Chen, Fuyu
    Zhang, Hui
    Yang, Zhiyu
    Liu, Wenwen
    Shen, Zequan
    Xiong, Haitao
    EXPERT SYSTEMS, 2023, 40 (06)
  • [46] Learning hyperbolic attention-based embeddings for link prediction in knowledge graphs
    Zeb, Adnan
    Ul Haq, Anwar
    Chen, Junde
    Lei, Zhenfeng
    Zhang, Defu
    KNOWLEDGE-BASED SYSTEMS, 2021, 229
  • [47] Attention-Based Aggregation Graph Networks for Knowledge Graph Information Transfer
    Zhao, Ming
    Jia, Weijia
    Huang, Yusheng
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2020, PT II, 2020, 12085 : 542 - 554
  • [48] Contextualized Attention-based Knowledge Transfer for Spoken Conversational Question Answering
    You, Chenyu
    Chen, Nuo
    Zou, Yuexian
    INTERSPEECH 2021, 2021, : 3211 - 3215
  • [49] Incorporating Word Reordering Knowledge into Attention-based Neural Machine Translation
    Zhang, Jinchao
    Wang, Mingxuan
    Liu, Qun
    Zhou, Jie
    PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 1, 2017, : 1524 - 1534
  • [50] Knowledge Distillation With Feature Self Attention
    Park, Sin-Gu
    Kang, Dong-Joong
    IEEE ACCESS, 2023, 11 : 34554 - 34562