Improving Vision Transformers with Nested Multi-head Attentions

被引:2
|
作者
Peng, Jiquan [1 ,2 ]
Li, Chaozhuo [3 ]
Zhao, Yi [1 ,2 ]
Lin, Yuting [1 ,2 ]
Fang, Xiaohan [1 ,2 ]
Gong, Jibing [1 ,2 ]
机构
[1] Yanshan Univ, Sch Informat Sci & Engn, Qinhuangdao, Peoples R China
[2] Key Lab Comp Virtual Technol & Syst Integrat Hebe, Shijiazhuang, Peoples R China
[3] Microsoft Res Asia, Beijing, Peoples R China
关键词
Vision Transformers; Disentangled Representation;
D O I
10.1109/ICME55011.2023.00330
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Vision transformers have significantly advanced the field of computer vision in recent years. The cornerstone of these transformers is the multi-head attention mechanism, which models interactions between visual elements within a feature map. However, the vanilla multi-head attention paradigm independently learns parameters for each head, which ignores crucial interactions across different attention heads and may result in redundancy and under-utilization of the model's capacity. To enhance model expressiveness, we propose a novel nested attention mechanism, Ne-Att, that explicitly models cross-head interactions via a hierarchical variational distribution. We conducted extensive experiments on image classification, and the results demonstrate the superiority of Ne-Att.
引用
收藏
页码:1925 / 1930
页数:6
相关论文
共 50 条
  • [31] U-Former: Improving Monaural Speech Enhancement with Multi-head Self and Cross Attention
    Xu, Xinmeng
    Hao, Jianjun
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 663 - 669
  • [32] Improving CRNN with EfficientNet-like feature extractor and multi-head attention for text recognition
    Dinh Viet Sang
    Le Tran Bao Cuong
    SOICT 2019: PROCEEDINGS OF THE TENTH INTERNATIONAL SYMPOSIUM ON INFORMATION AND COMMUNICATION TECHNOLOGY, 2019, : 285 - 290
  • [33] Self Multi-Head Attention for Speaker Recognition
    India, Miquel
    Safari, Pooyan
    Hernando, Javier
    INTERSPEECH 2019, 2019, : 4305 - 4309
  • [34] Multi-Head Encoding for Extreme Label Classification
    Liang, Daojun
    Zhang, Haixia
    Yuan, Dongfeng
    Zhang, Minggao
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2025, 47 (03) : 2199 - 2211
  • [35] Simultaneous multi-head calibration for pinhole SPECT
    Metzler, SD
    Jaszczak, RJ
    IEEE TRANSACTIONS ON NUCLEAR SCIENCE, 2006, 53 (01) : 113 - 120
  • [36] DOUBLE MULTI-HEAD ATTENTION FOR SPEAKER VERIFICATION
    India, Miquel
    Safari, Pooyan
    Hernando, Javier
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6144 - 6148
  • [37] Learning Sentences Similarity By Multi-Head Attention
    Wang, Ming Yang
    Li, Chen Jiang
    Sun, Jian Dong
    Xu, Wei Ran
    Gao, Sheng
    Zhang, Ya Hao
    Wang, Pu
    Li, Jun Liang
    PROCEEDINGS OF 2018 INTERNATIONAL CONFERENCE ON NETWORK INFRASTRUCTURE AND DIGITAL CONTENT (IEEE IC-NIDC), 2018, : 16 - 19
  • [38] Support to Multimedia Presentations on Multi-Head Setups
    Moreno, Marcio Ferreira
    Gomes Soares, Luiz Fernando
    2015 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM), 2015, : 381 - 384
  • [39] Head and state hierarchies for unary multi-head finite automata
    Martin Kutrib
    Andreas Malcher
    Matthias Wendlandt
    Acta Informatica, 2014, 51 : 553 - 569
  • [40] Simultaneous multi-head calibration for pinhole SPECT
    Metzler, SD
    Jaszczak, RJ
    2004 IEEE NUCLEAR SCIENCE SYMPOSIUM CONFERENCE RECORD, VOLS 1-7, 2004, : 2947 - 2951