Multilevel Attention-Based Sample Correlations for Knowledge Distillation

被引：58

作者：

Gou, Jianping ^{[1
,2
]}

Sun, Liyuan ^{[2
]}

Yu, Baosheng ^{[3
]}

Wan, Shaohua ^{[4
]}

Ou, Weihua ^{[5
]}

Yi, Zhang ^{[6
]}

机构：

[1] Southwest Univ, Coll Comp & Informat Sci, Coll Software, Chongqing 400715, Peoples R China

[2] Jiangsu Univ, Sch Comp Sci & Commun Engn, Zhenjiang 212013, Jiangsu, Peoples R China

[3] Univ Sydney, Fac Engn, Sch Comp Sci, Sydney, NSW 2008, Australia

[4] Univ Elect Sci & Technol China, Shenzhen Inst Adv Study, Shenzhen 518110, Peoples R China

[5] Guizhou Normal Univ, Sch Big Data & Comp Sci, Guiyang 550025, Guizhou, Peoples R China

[6] Sichuan Univ, Sch Comp Sci, Chengdu 610065, Peoples R China

来源：

IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS | 2023年 / 19卷 / 05期

基金：

中国国家自然科学基金;

关键词：

Knowledge engineering; Correlation; Computational modeling; Training; Neural networks; Deep learning; Informatics; Knowledge distillation (KD); model compression; relational knowledge; visual recognition;

D O I：

10.1109/TII.2022.3209672

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Recently, model compression has been widely used for the deployment of cumbersome deep models on resource-limited edge devices in the performance-demanding industrial Internet of Things (IoT) scenarios. As a simple yet effective model compression technique, knowledge distillation (KD) aims to transfer the knowledge (e.g., sample relationships as the relational knowledge) from a large teacher model to a small student model. However, existing relational KD methods usually build sample correlations directly from the feature maps at a certain middle layer in deep neural networks, which tends to overfit the feature maps of the teacher model and fails to address the most important sample regions. Inspired by this, we argue that the characteristics of important regions are of great importance, and thus, introduce attention maps to construct sample correlations for knowledge distillation. Specifically, with attention maps from multiple middle layers, attention-based sample correlations are newly built upon the most informative sample regions, and can be used as an effective and novel relational knowledge for knowledge distillation. We refer to the proposed method as multilevel attention-based sample correlations for knowledge distillation (or MASCKD). We perform extensive experiments on popular KD datasets for image classification, image retrieval, and person reidentification, where the experimental results demonstrate the effectiveness of the proposed method for relational KD.

引用

页码：7099 / 7109

页数：11

共 50 条

[41] Author Correction: Attention and feature transfer based knowledge distillation
Guoliang Yang
Shuaiying Yu
Yangyang Sheng
Hao Yang
Scientific Reports, 13
[42] Knowledge-Grounded Attention-Based Neural Machine Translation Model
Israr, Huma
Khan, Safdar Abbas
Tahir, Muhammad Ali
Shahzad, Muhammad Khuram
Ahmad, Muneer
Zain, Jasni Mohamad
APPLIED COMPUTATIONAL INTELLIGENCE AND SOFT COMPUTING, 2025, 2025 (01)
[43] Attention-based Learning for Multiple Relation Patterns in Knowledge Graph Embedding
Song, Tengwei
Luo, Jie
KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT I, 2022, 13368 : 658 - 670
[44] Talking-heads attention-based knowledge representation for link prediction
Wang, Shirui
Zhou, Wen'an
Zhou, Qiang
COMPUTER SPEECH AND LANGUAGE, 2022, 74
[45] An attention-based representation learning model for multiple relational knowledge graph
Han, Zhongming
Chen, Fuyu
Zhang, Hui
Yang, Zhiyu
Liu, Wenwen
Shen, Zequan
Xiong, Haitao
EXPERT SYSTEMS, 2023, 40 (06)
[46] Learning hyperbolic attention-based embeddings for link prediction in knowledge graphs
Zeb, Adnan
Ul Haq, Anwar
Chen, Junde
Lei, Zhenfeng
Zhang, Defu
KNOWLEDGE-BASED SYSTEMS, 2021, 229
[47] Attention-Based Aggregation Graph Networks for Knowledge Graph Information Transfer
Zhao, Ming
Jia, Weijia
Huang, Yusheng
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2020, PT II, 2020, 12085 : 542 - 554
[48] Contextualized Attention-based Knowledge Transfer for Spoken Conversational Question Answering
You, Chenyu
Chen, Nuo
Zou, Yuexian
INTERSPEECH 2021, 2021, : 3211 - 3215
[49] Incorporating Word Reordering Knowledge into Attention-based Neural Machine Translation
Zhang, Jinchao
Wang, Mingxuan
Liu, Qun
Zhou, Jie
PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 1, 2017, : 1524 - 1534
[50] Knowledge Distillation With Feature Self Attention
Park, Sin-Gu
Kang, Dong-Joong
IEEE ACCESS, 2023, 11 : 34554 - 34562

← 1 2 3 4 5 →