ScalableViT: Rethinking the Context-Oriented Generalization of Vision Transformer

被引:20
|
作者
Yang, Rui [1 ]
Ma, Hailong [2 ]
Wu, Jie [2 ]
Tang, Yansong [1 ]
Xiao, Xuefeng [2 ]
Zheng, Min [2 ]
Li, Xiu [1 ]
机构
[1] Tsinghua Univ, Tsinghua Shenzhen Int Grad Sch, Beijing, Peoples R China
[2] ByteDance Inc, Beijing, Peoples R China
来源
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
Vision transformer; Self-attention mechanism; Classification; Detection; Semantic segmentation;
D O I
10.1007/978-3-031-20053-3_28
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The vanilla self-attention mechanism inherently relies on pre-defined and steadfast computational dimensions. Such inflexibility restricts it from possessing context-oriented generalization that can bring more contextual cues and global representations. To mitigate this issue, we propose a Scalable Self-Attention (SSA) mechanism that leverages two scaling factors to release dimensions of query, key, and value matrices while unbinding them with the input. This scalability fetches context-oriented generalization and enhances object sensitivity, which pushes the whole network into a more effective trade-off state between accuracy and cost. Furthermore, we propose an Interactive Windowbased Self-Attention (IWSA), which establishes interaction between nonoverlapping regions by re-merging independent value tokens and aggregating spatial information from adjacent windows. By stacking the SSA and IWSA alternately, the Scalable Vision Transformer (ScalableViT) achieves state-of-the-art performance on general-purpose vision tasks. For example, ScalableViT-S outperforms Twins-SVT-S by 1.4% and Swin-T by 1.8% on ImageNet-1K classification.
引用
收藏
页码:480 / 496
页数:17
相关论文
共 50 条
  • [41] The ECORA framework: A hybrid architecture for context-oriented pervasive computing
    Padovitz, Amir
    Loke, Seng W.
    Zaslavsky, Arkady
    PERVASIVE AND MOBILE COMPUTING, 2008, 4 (02) : 182 - 215
  • [42] Context-oriented trust computation model for industrial Internet of Things
    Altaf, Ayesha
    Abbas, Haider
    Iqbal, Faiza
    Khan, Farrukh Aslam
    Rubab, Saddaf
    Derhab, Abdelouahid
    COMPUTERS & ELECTRICAL ENGINEERING, 2021, 92
  • [43] An expressive and modular layer activation mechanism for Context-Oriented Programming
    Leger, Paul
    Cardozo, Nicolas
    Masuhara, Hidehiko
    INFORMATION AND SOFTWARE TECHNOLOGY, 2023, 156
  • [44] Beyond Treatment and Impact: A Context-Oriented Approach to Employment Discrimination
    Hirsh, C. Elizabeth
    AMERICAN BEHAVIORAL SCIENTIST, 2014, 58 (02) : 256 - 273
  • [45] An open implementation for context-oriented layer composition in ContextJS']JS
    Lincke, Jens
    Appeltauer, Malte
    Steinert, Bastian
    Hirschfeld, Robert
    SCIENCE OF COMPUTER PROGRAMMING, 2011, 76 (12) : 1194 - 1209
  • [46] Facing Reality: Context-Oriented Reflection in Social Work Education
    Segev, Einav
    Nadan, Yochay
    BRITISH JOURNAL OF SOCIAL WORK, 2016, 46 (02): : 427 - 443
  • [47] Eliciting context-oriented NFR constraints and conflicts in robotic systems
    Bag, Raunak
    Roy, Mandira
    Cortesi, Agostino
    Chaki, Nabendu
    INNOVATIONS IN SYSTEMS AND SOFTWARE ENGINEERING, 2023,
  • [48] Event-Specific Software Composition in Context-Oriented Programming
    Appeltauer, Malte
    Hirschfeld, Robert
    Masuhara, Hidehiko
    Haupt, Michael
    Kawauchi, Kazunori
    SOFTWARE COMPOSITION, PROCEEDINGS, 2010, 6144 : 50 - +
  • [49] Research of Context-oriented Adaptive Content Framework in Seamless Learning
    Zhong Qiuyan
    Liu Xiaodong
    Ji Shaobo
    PROCEEDINGS OF THE 2013 CONFERENCE ON EDUCATION TECHNOLOGY AND MANAGEMENT SCIENCE (ICETMS 2013), 2013, : 711 - 714
  • [50] Points-to Analysis for Context-Oriented Java']JavaScript Programs
    Cardenas, Sergio
    Leger, Paul
    Fukuda, Hiroaki
    Cardozo, Nicolas
    PROCEEDINGS OF THE 25TH ACM INTERNATIONAL WORKSHOP ON FORMAL TECHNIQUES FOR JAVA-LIKE PROGRAMS, FTFJP 2023, 2023, : 18 - 24