ScalableViT: Rethinking the Context-Oriented Generalization of Vision Transformer

被引：20

作者：

Yang, Rui ^{[1
]}

Ma, Hailong ^{[2
]}

Wu, Jie ^{[2
]}

Tang, Yansong ^{[1
]}

Xiao, Xuefeng ^{[2
]}

Zheng, Min ^{[2
]}

Li, Xiu ^{[1
]}

机构：

[1] Tsinghua Univ, Tsinghua Shenzhen Int Grad Sch, Beijing, Peoples R China

[2] ByteDance Inc, Beijing, Peoples R China

来源：

COMPUTER VISION, ECCV 2022, PT XXIV | 2022年 / 13684卷

基金：

国家重点研发计划; 中国国家自然科学基金;

关键词：

Vision transformer; Self-attention mechanism; Classification; Detection; Semantic segmentation;

D O I：

10.1007/978-3-031-20053-3_28

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The vanilla self-attention mechanism inherently relies on pre-defined and steadfast computational dimensions. Such inflexibility restricts it from possessing context-oriented generalization that can bring more contextual cues and global representations. To mitigate this issue, we propose a Scalable Self-Attention (SSA) mechanism that leverages two scaling factors to release dimensions of query, key, and value matrices while unbinding them with the input. This scalability fetches context-oriented generalization and enhances object sensitivity, which pushes the whole network into a more effective trade-off state between accuracy and cost. Furthermore, we propose an Interactive Windowbased Self-Attention (IWSA), which establishes interaction between nonoverlapping regions by re-merging independent value tokens and aggregating spatial information from adjacent windows. By stacking the SSA and IWSA alternately, the Scalable Vision Transformer (ScalableViT) achieves state-of-the-art performance on general-purpose vision tasks. For example, ScalableViT-S outperforms Twins-SVT-S by 1.4% and Swin-T by 1.8% on ImageNet-1K classification.

引用

页码：480 / 496

页数：17

共 50 条

[31] JavaCtx: Seamless toolchain integration for context-oriented programming
Salvaneschi, Guido
Ghezzi, Carlo
Pradella, Matteo
Proceedings of the 3rd International Workshop on Context-Oriented Programming, COP'11 - Co-located with the 25th European Conference on Object-Oriented Programming, ECOOP 2011, 2011,
[32] ReactCOP: Modular and ScalableWeb Development with Context-Oriented Programming
Lorenz, David H.
Shmuel, Ofir
COMPANION PROCEEDINGS OF THE 2023 ACM SIGPLAN INTERNATIONAL CONFERENCE ON SYSTEMS, PROGRAMMING, LANGUAGES, AND APPLICATIONS: SOFTWARE FOR HUMANITY, SPLASH COMPANION 2023, 2023, : 63 - 64
[33] Context-oriented and transaction-based service provisioning
ICT Department, The Open University, Milton Keynes MK7 6AA, United Kingdom
不详
Int. J. Web Grid. Serv., 2007, 2 (194-218):
[34] Visual argumentation in political advertising A context-oriented perspective
Dahl, John Magnus R.
JOURNAL OF ARGUMENTATION IN CONTEXT, 2015, 4 (03) : 286 - 298
[35] Special Section: Context-Oriented Information Integration Foreword
Bhide, Manish
Haas, Laura
Ives, Zack
Mohania, Mukesh
INFORMATION SYSTEMS, 2010, 35 (02) : 139 - 139
[36] Cognitive processes underlying anticipation in a context-oriented task
Murphy, Colm P.
Jackson, Robin C.
Roca, Andre
Williams, A. Mark
JOURNAL OF SPORT & EXERCISE PSYCHOLOGY, 2015, 37 : S53 - S53
[37] Towards a DevOps Modeling Based on Context-Oriented Programming
Watanabe, Harumi
Ogura, Nobuhiko
Hisazumi, Kenji
PROCEEDINGS OF THE 8TH INTERNATIONAL CONFERENCE ON THE ART, SCIENCE, AND ENGINEERING OF PROGRAMMING, PROGRAMMING COMPANION 2024, 2024, : 6 - 7
[38] THE EFFECT OF CONTEXT-ORIENTED ARCHITECTURE ON TOURISM QUALITY ENHANCEMENT
Karamishoar, Shirin
Oriyaninejad, Reza
IIOAB JOURNAL, 2016, 7 : 383 - 392
[39] SMT-based Debugging Support for Context-oriented Programming
Uchio, S., 1600, Japan Society for Software Science and Technology (29):
[40] Client problem- and context-oriented perception and related communication
Milani, Myrna
CANADIAN VETERINARY JOURNAL-REVUE VETERINAIRE CANADIENNE, 2021, 62 (06): : 645 - 646

← 1 2 3 4 5 →