ScalableViT: Rethinking the Context-Oriented Generalization of Vision Transformer

被引:20
|
作者
Yang, Rui [1 ]
Ma, Hailong [2 ]
Wu, Jie [2 ]
Tang, Yansong [1 ]
Xiao, Xuefeng [2 ]
Zheng, Min [2 ]
Li, Xiu [1 ]
机构
[1] Tsinghua Univ, Tsinghua Shenzhen Int Grad Sch, Beijing, Peoples R China
[2] ByteDance Inc, Beijing, Peoples R China
来源
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
Vision transformer; Self-attention mechanism; Classification; Detection; Semantic segmentation;
D O I
10.1007/978-3-031-20053-3_28
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The vanilla self-attention mechanism inherently relies on pre-defined and steadfast computational dimensions. Such inflexibility restricts it from possessing context-oriented generalization that can bring more contextual cues and global representations. To mitigate this issue, we propose a Scalable Self-Attention (SSA) mechanism that leverages two scaling factors to release dimensions of query, key, and value matrices while unbinding them with the input. This scalability fetches context-oriented generalization and enhances object sensitivity, which pushes the whole network into a more effective trade-off state between accuracy and cost. Furthermore, we propose an Interactive Windowbased Self-Attention (IWSA), which establishes interaction between nonoverlapping regions by re-merging independent value tokens and aggregating spatial information from adjacent windows. By stacking the SSA and IWSA alternately, the Scalable Vision Transformer (ScalableViT) achieves state-of-the-art performance on general-purpose vision tasks. For example, ScalableViT-S outperforms Twins-SVT-S by 1.4% and Swin-T by 1.8% on ImageNet-1K classification.
引用
收藏
页码:480 / 496
页数:17
相关论文
共 50 条
  • [21] Context-oriented programming: A software engineering perspective
    Salvaneschi, Guido
    Ghezzi, Carlo
    Pradella, Matteo
    JOURNAL OF SYSTEMS AND SOFTWARE, 2012, 85 (08) : 1801 - 1817
  • [22] Context-oriented Knowledge Management in Production Networks
    Sandkuhl, Kurt
    Smirnov, Alexander
    APPLIED COMPUTER SYSTEMS, 2018, 23 (02) : 81 - 89
  • [23] Semantics for consistent activation in context-oriented systems
    Cardozo, Nicolas
    Gonzalez, Sebastian
    Mens, Kim
    Van Der Straeten, Ragnhild
    Vallejos, Jorge
    D'Hondt, Theo
    INFORMATION AND SOFTWARE TECHNOLOGY, 2015, 58 : 71 - 94
  • [24] Context-oriented Software Transactional Memory in Common Lisp
    Costanza, Pascal
    Herzeel, Charlotte
    D'Hondt, Theo
    ACM SIGPLAN NOTICES, 2009, 44 (12) : 59 - 68
  • [25] Dynamic Visualisation of Features and Contexts for Context-Oriented Programmers
    Duhoux, Benoit
    Dumas, Bruno
    Leung, Hoo Sing
    Mens, Kim
    PROCEEDINGS OF THE ACM SIGCHI SYMPOSIUM ON ENGINEERING INTERACTIVE COMPUTING SYSTEMS (EICS'19), 2019,
  • [26] Towards Context-oriented Project Management for Virtual Organizations
    Hsu, Chouyin
    Lee, Minfeng
    JCPC: 2009 JOINT CONFERENCE ON PERVASIVE COMPUTING, 2009, : 761 - 764
  • [27] A context-oriented framework for software testing in pervasive environment
    Lu, Heng
    29TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: ICSE 2007 COMPANION VOLUME, PROCEEDINGS, 2007, : 77 - 78
  • [28] Reconciling Context-Oriented Programming and User Interface Adaptation
    Duhoux, Benoit
    PROCEEDINGS OF THE ACM SIGCHI SYMPOSIUM ON ENGINEERING INTERACTIVE COMPUTING SYSTEMS (EICS'18), 2018,
  • [29] Detecting the Onset of Dementia using Context-Oriented Architecture
    Magableh, Basel
    AlBeiruti, Nidal
    2012 6TH INTERNATIONAL CONFERENCE ON NEXT GENERATION MOBILE APPLICATIONS, SERVICES AND TECHNOLOGIES (NGMAST), 2012, : 24 - 30
  • [30] LETTERS AS A MEANS OF DAILY COMMUNICATION - A CONTEXT-ORIENTED ANALYSIS
    GOTZ, I
    LOFFLER, K
    SPECKLE, B
    SCHWEIZERISCHES ARCHIV FUR VOLKSKUNDE, 1993, 89 (02): : 165 - 183