Improving Vision Transformers with Nested Multi-head Attentions

被引:2
|
作者
Peng, Jiquan [1 ,2 ]
Li, Chaozhuo [3 ]
Zhao, Yi [1 ,2 ]
Lin, Yuting [1 ,2 ]
Fang, Xiaohan [1 ,2 ]
Gong, Jibing [1 ,2 ]
机构
[1] Yanshan Univ, Sch Informat Sci & Engn, Qinhuangdao, Peoples R China
[2] Key Lab Comp Virtual Technol & Syst Integrat Hebe, Shijiazhuang, Peoples R China
[3] Microsoft Res Asia, Beijing, Peoples R China
关键词
Vision Transformers; Disentangled Representation;
D O I
10.1109/ICME55011.2023.00330
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Vision transformers have significantly advanced the field of computer vision in recent years. The cornerstone of these transformers is the multi-head attention mechanism, which models interactions between visual elements within a feature map. However, the vanilla multi-head attention paradigm independently learns parameters for each head, which ignores crucial interactions across different attention heads and may result in redundancy and under-utilization of the model's capacity. To enhance model expressiveness, we propose a novel nested attention mechanism, Ne-Att, that explicitly models cross-head interactions via a hierarchical variational distribution. We conducted extensive experiments on image classification, and the results demonstrate the superiority of Ne-Att.
引用
收藏
页码:1925 / 1930
页数:6
相关论文
共 50 条
  • [21] Multi-Head Attention with Disagreement Regularization
    Li, Jian
    Tu, Zhaopeng
    Yang, Baosong
    Lyu, Michael R.
    Zhang, Tong
    2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 2897 - 2903
  • [22] BEAVER MULTI-HEAD MILLING MACHINES
    BARKER, AJ
    MACHINERY AND PRODUCTION ENGINEERING, 1974, 124 (3206): : 550 - 554
  • [23] Improving Efficiency and Generalisability of Motion Predictions With Deep Multi-Agent Learning and Multi-Head Attention
    Benrachou, Djamel Eddine
    Glaser, Sebastien
    Elhenawy, Mohammed
    Rakotonirainy, Andry
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024, 25 (06) : 5356 - 5373
  • [24] On multi-head automata with restricted nondeterminism
    Reidenbach, Daniel
    Schmid, Markus L.
    INFORMATION PROCESSING LETTERS, 2012, 112 (14-15) : 572 - 577
  • [25] Posture Detection of Heart Disease Using Multi-Head Attention Vision Hybrid (MHAVH) Model
    Naz, Hina
    Zhang, Zuping
    Al-Habib, Mohammed
    Awwad, Fuad A.
    Ismail, Emad A. A.
    Khan, Zaid Ali
    CMC-COMPUTERS MATERIALS & CONTINUA, 2024, 79 (02): : 2673 - 2696
  • [26] Interpretable Detection of Malicious Behavior in Windows Portable Executables Using Multi-Head 2D Transformers
    Khan, Sohail
    Nauman, Mohammad
    BIG DATA MINING AND ANALYTICS, 2024, 7 (02): : 485 - 499
  • [27] MULTI-TAPE AND MULTI-HEAD PUSHDOWN AUTOMATA
    HARRISON, MA
    IBARRA, OH
    INFORMATION AND CONTROL, 1968, 13 (05): : 433 - &
  • [28] On synchronized multi-tape and multi-head automata
    Ibarra, Oscar H.
    Tran, Nicholas Q.
    THEORETICAL COMPUTER SCIENCE, 2012, 449 : 74 - 84
  • [29] Multi-Head Multi-Loss Model Calibration
    Galdran, Adrian
    Verjans, Johan W.
    Carneiro, Gustavo
    Ballester, Miguel A. Gonzalez
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT III, 2023, 14222 : 108 - 117
  • [30] Head and state hierarchies for unary multi-head finite automata
    Kutrib, Martin
    Malcher, Andreas
    Wendlandt, Matthias
    ACTA INFORMATICA, 2014, 51 (08) : 553 - 569