ScalableViT: Rethinking the Context-Oriented Generalization of Vision Transformer

被引:20
|
作者
Yang, Rui [1 ]
Ma, Hailong [2 ]
Wu, Jie [2 ]
Tang, Yansong [1 ]
Xiao, Xuefeng [2 ]
Zheng, Min [2 ]
Li, Xiu [1 ]
机构
[1] Tsinghua Univ, Tsinghua Shenzhen Int Grad Sch, Beijing, Peoples R China
[2] ByteDance Inc, Beijing, Peoples R China
来源
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
Vision transformer; Self-attention mechanism; Classification; Detection; Semantic segmentation;
D O I
10.1007/978-3-031-20053-3_28
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The vanilla self-attention mechanism inherently relies on pre-defined and steadfast computational dimensions. Such inflexibility restricts it from possessing context-oriented generalization that can bring more contextual cues and global representations. To mitigate this issue, we propose a Scalable Self-Attention (SSA) mechanism that leverages two scaling factors to release dimensions of query, key, and value matrices while unbinding them with the input. This scalability fetches context-oriented generalization and enhances object sensitivity, which pushes the whole network into a more effective trade-off state between accuracy and cost. Furthermore, we propose an Interactive Windowbased Self-Attention (IWSA), which establishes interaction between nonoverlapping regions by re-merging independent value tokens and aggregating spatial information from adjacent windows. By stacking the SSA and IWSA alternately, the Scalable Vision Transformer (ScalableViT) achieves state-of-the-art performance on general-purpose vision tasks. For example, ScalableViT-S outperforms Twins-SVT-S by 1.4% and Swin-T by 1.8% on ImageNet-1K classification.
引用
收藏
页码:480 / 496
页数:17
相关论文
共 50 条
  • [1] Context-oriented Programming
    Hirschfeld, Robert
    Costanza, Pascal
    Nierstrasz, Oscar
    JOURNAL OF OBJECT TECHNOLOGY, 2008, 7 (03): : 125 - 151
  • [2] Context-Oriented Behavioral Programming
    Elyasaf, Achiya
    Information and Software Technology, 2021, 133
  • [3] Context-oriented domain analysis
    Desmet, Brecht
    Vallejos, Jorge
    Costanza, Pascal
    De Meuter, Wolfgang
    D'Hondt, Theo
    MODELING AND USING CONTEXT, 2007, 4635 : 178 - +
  • [4] Context-oriented image retrieval
    O'Sullivan, D
    McLoughlin, E
    Bertolotto, M
    Wilson, D
    MODELING AND USING CONTEXT, PROCEEDINGS, 2005, 3554 : 339 - 352
  • [5] Context-Oriented Behavioral Programming
    Elyasaf, Achiya
    INFORMATION AND SOFTWARE TECHNOLOGY, 2021, 133
  • [6] An Introduction to Context-Oriented Programming with ContextS
    Hirschfeld, Robert
    Costanza, Pascal
    Haupt, Michael
    GENERATIVE AND TRANSFORMATIONAL TECHNIQUES IN SOFTWARE ENGINEERING II, 2008, 5235 : 396 - +
  • [7] A Context-Oriented Extension of F#
    Canciani, Andrea
    Degano, Pierpaolo
    Ferrari, Gian-Luigi
    Galletta, Letterio
    ELECTRONIC PROCEEDINGS IN THEORETICAL COMPUTER SCIENCE, 2015, (201): : 18 - 32
  • [8] Requirements Analysis for Context-oriented Systems
    Kirsch-Pinheiro, Manuele
    Mazo, Raul
    Souveyet, Carine
    Sprovieri, Danillo
    7TH INTERNATIONAL CONFERENCE ON AMBIENT SYSTEMS, NETWORKS AND TECHNOLOGIES (ANT 2016) / THE 6TH INTERNATIONAL CONFERENCE ON SUSTAINABLE ENERGY INFORMATION TECHNOLOGY (SEIT-2016) / AFFILIATED WORKSHOPS, 2016, 83 : 253 - 261
  • [9] Context-oriented design of industrial exoskeletons
    Hoffmann, Niclas
    Ralfs, Lennart
    Weidner, Robert
    Konstruktion, 2024, 76 (8-9): : 64 - 70
  • [10] TinyCORP: A Calculus for Context-Oriented Reactive Programming
    Kamina, Tetsuo
    Aotani, Tomoyuki
    PROCEEDINGS OF THE WORKSHOP ON CONTEXT-ORIENTED PROGRAMMING (COP'19), 2019, : 1 - 8