Co-advise: Cross Inductive Bias Distillation

被引：25

作者：

Ren, Sucheng ^{[1
,5
]}

Gao, Zhengqi ^{[2
]}

Hua, Tianyu ^{[3
,5
]}

Xue, Zihui ^{[4
]}

Tian, Yonglong ^{[2
]}

He, Shengfeng ^{[1
]}

Zhao, Hang ^{[3
,5
]}

机构：

[1] South China Univ Technol, Guangzhou, Peoples R China

[2] MIT, Cambridge, MA 02139 USA

[3] Tsinghua Univ, Beijing, Peoples R China

[4] Univ Texas Austin, Austin, TX 78712 USA

[5] Shanghai Qi Zhi Inst, Shanghai, Peoples R China

来源：

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022) | 2022年

基金：

中国国家自然科学基金;

关键词：

D O I：

10.1109/CVPR52688.2022.01627

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The inductive bias of vision transformers is more relaxed that cannot work well with insufficient data. Knowledge distillation is thus introduced to assist the training of transformers. Unlike previous works, where merely heavy convolution-based teachers are provided, in this paper, we delve into the influence of models inductive biases in knowledge distillation (e.g., convolution and involution). Our key observation is that the teacher accuracy is not the dominant reason for the student accuracy, but the teacher inductive bias is more important. We demonstrate that lightweight teachers with different architectural inductive biases can be used to co-advise the student transformer with outstanding performances. The rationale behind is that models designed with different inductive biases tend to focus on diverse patterns, and teachers with different inductive biases attain various knowledge despite being trained on the same dataset. The diverse knowledge provides a more precise and comprehensive description of the data and compounds and boosts the performance of the student during distillation. Furthermore, we propose a token inductive bias alignment to align the inductive bias of the token with its target teacher model. With only lightweight teachers provided and using this cross inductive bias distillation method, our vision transformers (termed as CiT) outperform all previous vision transformers (ViT) of the same architecture on ImageNet. Moreover, our small size model CiT-SAK further achieves 82.7% Top-1 accuracy on ImageNet without modifying the attention module of the ViT. Code is available at https://github.com/OliverRensu/co-advise.

引用

页码：16752 / 16761

页数：10

共 50 条

[41] An inductive learning bias toward phonetically driven tonal phonotactics
Chen, Tsung-Ying
LANGUAGE ACQUISITION, 2020, 27 (03) : 331 - 361
[42] Video Reenactment as Inductive Bias for Content-Motion Disentanglement
Albarracin, Juan F. Hernandez
Ramirez Rivera, Adin
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 2365 - 2374
[43] BACKGROUND KNOWLEDGE AND DECLARATIVE BIAS IN INDUCTIVE CONCEPT-LEARNING
LAVRAC, N
DZEROSKI, S
LECTURE NOTES IN ARTIFICIAL INTELLIGENCE, 1992, 642 : 51 - 71
[44] Human Activity Recognition: A Dynamic Inductive Bias Selection Perspective
Hamidi, Massinissa
Osmani, Aomar
SENSORS, 2021, 21 (21)
[45] An inductive bias for slowly changing features in human reinforcement learning
Hedrich, Noa L.
Schulz, Eric
Hall-McMaster, Sam
Schuck, Nicolas W.
PLOS COMPUTATIONAL BIOLOGY, 2024, 20 (11)
[46] Symbolic inductive bias for visually grounded learning of spoken language
Chrupala, Grzegorz
57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 6452 - 6462
[47] Inductive bias for semi-supervised extreme learning machine
Bisio, Federica
Decherchi, Sergio
Gastaldo, Paolo
Zunino, Rodolfo
NEUROCOMPUTING, 2016, 174 : 154 - 167
[48] Disentangling Multi-view Representations Beyond Inductive Bias
Ke, Guanzhou
Yu, Yang
Chao, Guoqing
Wang, Xiaoli
Xu, Chenyang
He, Shengfeng
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 2582 - 2590
[49] REVISITING SPATIAL INDUCTIVE BIAS WITH MLP-LIKE MODEL
Imamura, Akihiro
Arizumi, Nana
2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 921 - 925
[50] Examining the Inductive Bias of Neural Language Models with Artificial Languages
White, Jennifer C.
Cotterell, Ryan
59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1 (ACL-IJCNLP 2021), 2021, : 454 - 463

← 1 2 3 4 5 →