ScalableViT: Rethinking the Context-Oriented Generalization of Vision Transformer

被引:20
|
作者
Yang, Rui [1 ]
Ma, Hailong [2 ]
Wu, Jie [2 ]
Tang, Yansong [1 ]
Xiao, Xuefeng [2 ]
Zheng, Min [2 ]
Li, Xiu [1 ]
机构
[1] Tsinghua Univ, Tsinghua Shenzhen Int Grad Sch, Beijing, Peoples R China
[2] ByteDance Inc, Beijing, Peoples R China
来源
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
Vision transformer; Self-attention mechanism; Classification; Detection; Semantic segmentation;
D O I
10.1007/978-3-031-20053-3_28
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The vanilla self-attention mechanism inherently relies on pre-defined and steadfast computational dimensions. Such inflexibility restricts it from possessing context-oriented generalization that can bring more contextual cues and global representations. To mitigate this issue, we propose a Scalable Self-Attention (SSA) mechanism that leverages two scaling factors to release dimensions of query, key, and value matrices while unbinding them with the input. This scalability fetches context-oriented generalization and enhances object sensitivity, which pushes the whole network into a more effective trade-off state between accuracy and cost. Furthermore, we propose an Interactive Windowbased Self-Attention (IWSA), which establishes interaction between nonoverlapping regions by re-merging independent value tokens and aggregating spatial information from adjacent windows. By stacking the SSA and IWSA alternately, the Scalable Vision Transformer (ScalableViT) achieves state-of-the-art performance on general-purpose vision tasks. For example, ScalableViT-S outperforms Twins-SVT-S by 1.4% and Swin-T by 1.8% on ImageNet-1K classification.
引用
收藏
页码:480 / 496
页数:17
相关论文
共 50 条
  • [31] JavaCtx: Seamless toolchain integration for context-oriented programming
    Salvaneschi, Guido
    Ghezzi, Carlo
    Pradella, Matteo
    Proceedings of the 3rd International Workshop on Context-Oriented Programming, COP'11 - Co-located with the 25th European Conference on Object-Oriented Programming, ECOOP 2011, 2011,
  • [32] ReactCOP: Modular and ScalableWeb Development with Context-Oriented Programming
    Lorenz, David H.
    Shmuel, Ofir
    COMPANION PROCEEDINGS OF THE 2023 ACM SIGPLAN INTERNATIONAL CONFERENCE ON SYSTEMS, PROGRAMMING, LANGUAGES, AND APPLICATIONS: SOFTWARE FOR HUMANITY, SPLASH COMPANION 2023, 2023, : 63 - 64
  • [33] Context-oriented and transaction-based service provisioning
    ICT Department, The Open University, Milton Keynes MK7 6AA, United Kingdom
    不详
    Int. J. Web Grid. Serv., 2007, 2 (194-218):
  • [34] Visual argumentation in political advertising A context-oriented perspective
    Dahl, John Magnus R.
    JOURNAL OF ARGUMENTATION IN CONTEXT, 2015, 4 (03) : 286 - 298
  • [35] Special Section: Context-Oriented Information Integration Foreword
    Bhide, Manish
    Haas, Laura
    Ives, Zack
    Mohania, Mukesh
    INFORMATION SYSTEMS, 2010, 35 (02) : 139 - 139
  • [36] Cognitive processes underlying anticipation in a context-oriented task
    Murphy, Colm P.
    Jackson, Robin C.
    Roca, Andre
    Williams, A. Mark
    JOURNAL OF SPORT & EXERCISE PSYCHOLOGY, 2015, 37 : S53 - S53
  • [37] Towards a DevOps Modeling Based on Context-Oriented Programming
    Watanabe, Harumi
    Ogura, Nobuhiko
    Hisazumi, Kenji
    PROCEEDINGS OF THE 8TH INTERNATIONAL CONFERENCE ON THE ART, SCIENCE, AND ENGINEERING OF PROGRAMMING, PROGRAMMING COMPANION 2024, 2024, : 6 - 7
  • [38] THE EFFECT OF CONTEXT-ORIENTED ARCHITECTURE ON TOURISM QUALITY ENHANCEMENT
    Karamishoar, Shirin
    Oriyaninejad, Reza
    IIOAB JOURNAL, 2016, 7 : 383 - 392
  • [39] SMT-based Debugging Support for Context-oriented Programming
    Uchio, S., 1600, Japan Society for Software Science and Technology (29):
  • [40] Client problem- and context-oriented perception and related communication
    Milani, Myrna
    CANADIAN VETERINARY JOURNAL-REVUE VETERINAIRE CANADIENNE, 2021, 62 (06): : 645 - 646