Improving Vision Transformers with Nested Multi-head Attentions

被引:2
|
作者
Peng, Jiquan [1 ,2 ]
Li, Chaozhuo [3 ]
Zhao, Yi [1 ,2 ]
Lin, Yuting [1 ,2 ]
Fang, Xiaohan [1 ,2 ]
Gong, Jibing [1 ,2 ]
机构
[1] Yanshan Univ, Sch Informat Sci & Engn, Qinhuangdao, Peoples R China
[2] Key Lab Comp Virtual Technol & Syst Integrat Hebe, Shijiazhuang, Peoples R China
[3] Microsoft Res Asia, Beijing, Peoples R China
关键词
Vision Transformers; Disentangled Representation;
D O I
10.1109/ICME55011.2023.00330
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Vision transformers have significantly advanced the field of computer vision in recent years. The cornerstone of these transformers is the multi-head attention mechanism, which models interactions between visual elements within a feature map. However, the vanilla multi-head attention paradigm independently learns parameters for each head, which ignores crucial interactions across different attention heads and may result in redundancy and under-utilization of the model's capacity. To enhance model expressiveness, we propose a novel nested attention mechanism, Ne-Att, that explicitly models cross-head interactions via a hierarchical variational distribution. We conducted extensive experiments on image classification, and the results demonstrate the superiority of Ne-Att.
引用
收藏
页码:1925 / 1930
页数:6
相关论文
共 50 条
  • [41] VIDEO SUMMARIZATION WITH ANCHORS AND MULTI-HEAD ATTENTION
    Sung, Yi-Lin
    Hong, Cheng-Yao
    Hsu, Yen-Chi
    Liu, Tyng-Luh
    2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 2396 - 2400
  • [42] Multi-head Monitoring of Metric Temporal Logic
    Raszyk, Martin
    Basin, David
    Krstic, Srdan
    Traytel, Dmitriy
    AUTOMATED TECHNOLOGY FOR VERIFICATION AND ANALYSIS (ATVA 2019), 2019, 11781 : 151 - 170
  • [43] Software and Hardware Fusion Multi-Head Attention
    Hu, Wei
    Xu, Dian
    Liu, Fang
    Fan, Zimeng
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, KSEM 2022, PT III, 2022, 13370 : 644 - 655
  • [44] Classification of Heads in Multi-head Attention Mechanisms
    Huang, Feihu
    Jiang, Min
    Liu, Fang
    Xu, Dian
    Fan, Zimeng
    Wang, Yonghao
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, KSEM 2022, PT III, 2022, 13370 : 681 - 692
  • [45] Diversifying Multi-Head Attention in the Transformer Model
    Ampazis, Nicholas
    Sakketou, Flora
    MACHINE LEARNING AND KNOWLEDGE EXTRACTION, 2024, 6 (04): : 2618 - 2638
  • [46] Grinding camshafts with the landis multi-head grinder
    Anon
    Abrasives, 1994,
  • [47] Augmented multi-head classification network: MHATT
    Cayce, Garrett I.
    Depoian, Arthur C., II
    Bailey, Colleen P.
    Guturu, Parthasarathy
    SIGNAL PROCESSING, SENSOR/INFORMATION FUSION, AND TARGET RECOGNITION XXXII, 2023, 12547
  • [48] Finding the Pillars of Strength for Multi-Head Attention
    Ni, Jinjie
    Mao, Rui
    Yang, Zonglin
    Lei, Han
    Cambria, Erik
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 14526 - 14540
  • [49] Abstractive Text Summarization with Multi-Head Attention
    Li, Jinpeng
    Zhang, Chuang
    Chen, Xiaojun
    Cao, Yanan
    Liao, Pengcheng
    Zhang, Peng
    2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
  • [50] Multi-head Monitoring of Metric Dynamic Logic
    Raszyk, Martin
    Basin, David
    Traytel, Dmitriy
    AUTOMATED TECHNOLOGY FOR VERIFICATION AND ANALYSIS (ATVA 2020), 2020, 12302 : 233 - 250