ScalableViT: Rethinking the Context-Oriented Generalization of Vision Transformer

被引：20

作者：

Yang, Rui ^{[1
]}

Ma, Hailong ^{[2
]}

Wu, Jie ^{[2
]}

Tang, Yansong ^{[1
]}

Xiao, Xuefeng ^{[2
]}

Zheng, Min ^{[2
]}

Li, Xiu ^{[1
]}

机构：

[1] Tsinghua Univ, Tsinghua Shenzhen Int Grad Sch, Beijing, Peoples R China

[2] ByteDance Inc, Beijing, Peoples R China

来源：

COMPUTER VISION, ECCV 2022, PT XXIV | 2022年 / 13684卷

基金：

国家重点研发计划; 中国国家自然科学基金;

关键词：

Vision transformer; Self-attention mechanism; Classification; Detection; Semantic segmentation;

D O I：

10.1007/978-3-031-20053-3_28

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The vanilla self-attention mechanism inherently relies on pre-defined and steadfast computational dimensions. Such inflexibility restricts it from possessing context-oriented generalization that can bring more contextual cues and global representations. To mitigate this issue, we propose a Scalable Self-Attention (SSA) mechanism that leverages two scaling factors to release dimensions of query, key, and value matrices while unbinding them with the input. This scalability fetches context-oriented generalization and enhances object sensitivity, which pushes the whole network into a more effective trade-off state between accuracy and cost. Furthermore, we propose an Interactive Windowbased Self-Attention (IWSA), which establishes interaction between nonoverlapping regions by re-merging independent value tokens and aggregating spatial information from adjacent windows. By stacking the SSA and IWSA alternately, the Scalable Vision Transformer (ScalableViT) achieves state-of-the-art performance on general-purpose vision tasks. For example, ScalableViT-S outperforms Twins-SVT-S by 1.4% and Swin-T by 1.8% on ImageNet-1K classification.

引用

页码：480 / 496

页数：17

共 50 条

[1] Context-oriented Programming
Hirschfeld, Robert
Costanza, Pascal
Nierstrasz, Oscar
JOURNAL OF OBJECT TECHNOLOGY, 2008, 7 (03): : 125 - 151
[2] Context-Oriented Behavioral Programming
Elyasaf, Achiya
Information and Software Technology, 2021, 133
[3] Context-oriented domain analysis
Desmet, Brecht
Vallejos, Jorge
Costanza, Pascal
De Meuter, Wolfgang
D'Hondt, Theo
MODELING AND USING CONTEXT, 2007, 4635 : 178 - +
[4] Context-oriented image retrieval
O'Sullivan, D
McLoughlin, E
Bertolotto, M
Wilson, D
MODELING AND USING CONTEXT, PROCEEDINGS, 2005, 3554 : 339 - 352
[5] Context-Oriented Behavioral Programming
Elyasaf, Achiya
INFORMATION AND SOFTWARE TECHNOLOGY, 2021, 133
[6] An Introduction to Context-Oriented Programming with ContextS
Hirschfeld, Robert
Costanza, Pascal
Haupt, Michael
GENERATIVE AND TRANSFORMATIONAL TECHNIQUES IN SOFTWARE ENGINEERING II, 2008, 5235 : 396 - +
[7] A Context-Oriented Extension of F#
Canciani, Andrea
Degano, Pierpaolo
Ferrari, Gian-Luigi
Galletta, Letterio
ELECTRONIC PROCEEDINGS IN THEORETICAL COMPUTER SCIENCE, 2015, (201): : 18 - 32
[8] Requirements Analysis for Context-oriented Systems
Kirsch-Pinheiro, Manuele
Mazo, Raul
Souveyet, Carine
Sprovieri, Danillo
7TH INTERNATIONAL CONFERENCE ON AMBIENT SYSTEMS, NETWORKS AND TECHNOLOGIES (ANT 2016) / THE 6TH INTERNATIONAL CONFERENCE ON SUSTAINABLE ENERGY INFORMATION TECHNOLOGY (SEIT-2016) / AFFILIATED WORKSHOPS, 2016, 83 : 253 - 261
[9] Context-oriented design of industrial exoskeletons
Hoffmann, Niclas
Ralfs, Lennart
Weidner, Robert
Konstruktion, 2024, 76 (8-9): : 64 - 70
[10] TinyCORP: A Calculus for Context-Oriented Reactive Programming
Kamina, Tetsuo
Aotani, Tomoyuki
PROCEEDINGS OF THE WORKSHOP ON CONTEXT-ORIENTED PROGRAMMING (COP'19), 2019, : 1 - 8

← 1 2 3 4 5 →