ScalableViT: Rethinking the Context-Oriented Generalization of Vision Transformer

被引：20

作者：

Yang, Rui ^{[1
]}

Ma, Hailong ^{[2
]}

Wu, Jie ^{[2
]}

Tang, Yansong ^{[1
]}

Xiao, Xuefeng ^{[2
]}

Zheng, Min ^{[2
]}

Li, Xiu ^{[1
]}

机构：

[1] Tsinghua Univ, Tsinghua Shenzhen Int Grad Sch, Beijing, Peoples R China

[2] ByteDance Inc, Beijing, Peoples R China

来源：

COMPUTER VISION, ECCV 2022, PT XXIV | 2022年 / 13684卷

基金：

国家重点研发计划; 中国国家自然科学基金;

关键词：

Vision transformer; Self-attention mechanism; Classification; Detection; Semantic segmentation;

D O I：

10.1007/978-3-031-20053-3_28

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The vanilla self-attention mechanism inherently relies on pre-defined and steadfast computational dimensions. Such inflexibility restricts it from possessing context-oriented generalization that can bring more contextual cues and global representations. To mitigate this issue, we propose a Scalable Self-Attention (SSA) mechanism that leverages two scaling factors to release dimensions of query, key, and value matrices while unbinding them with the input. This scalability fetches context-oriented generalization and enhances object sensitivity, which pushes the whole network into a more effective trade-off state between accuracy and cost. Furthermore, we propose an Interactive Windowbased Self-Attention (IWSA), which establishes interaction between nonoverlapping regions by re-merging independent value tokens and aggregating spatial information from adjacent windows. By stacking the SSA and IWSA alternately, the Scalable Vision Transformer (ScalableViT) achieves state-of-the-art performance on general-purpose vision tasks. For example, ScalableViT-S outperforms Twins-SVT-S by 1.4% and Swin-T by 1.8% on ImageNet-1K classification.

引用

页码：480 / 496

页数：17

共 50 条

[41] The ECORA framework: A hybrid architecture for context-oriented pervasive computing
Padovitz, Amir
Loke, Seng W.
Zaslavsky, Arkady
PERVASIVE AND MOBILE COMPUTING, 2008, 4 (02) : 182 - 215
[42] Context-oriented trust computation model for industrial Internet of Things
Altaf, Ayesha
Abbas, Haider
Iqbal, Faiza
Khan, Farrukh Aslam
Rubab, Saddaf
Derhab, Abdelouahid
COMPUTERS & ELECTRICAL ENGINEERING, 2021, 92
[43] An expressive and modular layer activation mechanism for Context-Oriented Programming
Leger, Paul
Cardozo, Nicolas
Masuhara, Hidehiko
INFORMATION AND SOFTWARE TECHNOLOGY, 2023, 156
[44] Beyond Treatment and Impact: A Context-Oriented Approach to Employment Discrimination
Hirsh, C. Elizabeth
AMERICAN BEHAVIORAL SCIENTIST, 2014, 58 (02) : 256 - 273
[45] An open implementation for context-oriented layer composition in ContextJS']JS
Lincke, Jens
Appeltauer, Malte
Steinert, Bastian
Hirschfeld, Robert
SCIENCE OF COMPUTER PROGRAMMING, 2011, 76 (12) : 1194 - 1209
[46] Facing Reality: Context-Oriented Reflection in Social Work Education
Segev, Einav
Nadan, Yochay
BRITISH JOURNAL OF SOCIAL WORK, 2016, 46 (02): : 427 - 443
[47] Eliciting context-oriented NFR constraints and conflicts in robotic systems
Bag, Raunak
Roy, Mandira
Cortesi, Agostino
Chaki, Nabendu
INNOVATIONS IN SYSTEMS AND SOFTWARE ENGINEERING, 2023,
[48] Event-Specific Software Composition in Context-Oriented Programming
Appeltauer, Malte
Hirschfeld, Robert
Masuhara, Hidehiko
Haupt, Michael
Kawauchi, Kazunori
SOFTWARE COMPOSITION, PROCEEDINGS, 2010, 6144 : 50 - +
[49] Research of Context-oriented Adaptive Content Framework in Seamless Learning
Zhong Qiuyan
Liu Xiaodong
Ji Shaobo
PROCEEDINGS OF THE 2013 CONFERENCE ON EDUCATION TECHNOLOGY AND MANAGEMENT SCIENCE (ICETMS 2013), 2013, : 711 - 714
[50] Points-to Analysis for Context-Oriented Java']JavaScript Programs
Cardenas, Sergio
Leger, Paul
Fukuda, Hiroaki
Cardozo, Nicolas
PROCEEDINGS OF THE 25TH ACM INTERNATIONAL WORKSHOP ON FORMAL TECHNIQUES FOR JAVA-LIKE PROGRAMS, FTFJP 2023, 2023, : 18 - 24

← 1 2 3 4 5 →