A generic shared attention mechanism for various backbone neural networks

被引:1
|
作者
Huang, Zhongzhan [1 ]
Liang, Senwei [2 ]
Liang, Mingfu [3 ]
机构
[1] Sun Yat Sen Univ, Guangzhou 510275, Peoples R China
[2] Purdue Univ, W Lafayette, IN 47906 USA
[3] Northwestern Univ, Evanston, IL 60201 USA
基金
中国国家自然科学基金;
关键词
Layer-wise shared attention mechanism; Parameter sharing; Dense-and-implicit connection; Stable training;
D O I
10.1016/j.neucom.2024.128697
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The self-attention mechanism is crucial for enhancing various backbone neural networks' performance. However, current methods add self-attention modules (SAMs) to each network layer without fully utilizing their potential, resulting in suboptimal performance and higher parameter consumption as network depth increases. In this paper, we reveal an inherent phenomenon: SAMs produce highly correlated attention maps across layers, with an average Pearson correlation coefficient of 0.85. Inspired by this inherent observation, we propose Dense-and-Implicit Attention (DIA), which shares SAMs across layers and uses a long short-term memory module to calibrate and connect these correlated attention maps, improving parameter efficiency. This approach aligns with the neural network's dynamic system perspective. Extensive experiments show DIA consistently enhances various backbones like ResNet, Transformer, and UNet in tasks such as image classification, object detection, and image generation with diffusion models. Our analysis indicates that DIA's effectiveness stems from its dense inter-layer information connections, absent in conventional mechanisms, stabilizing training and providing regularization effects. This paper's insights advance our understanding of attention mechanisms, optimizing them and paving the way for future developments across diverse neural networks.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Central Attention Mechanism for Convolutional Neural Networks
    Geng, Y.X.
    Wang, L.
    Wang, Z.Y.
    Wang, Y.G.
    IAENG International Journal of Computer Science, 2024, 51 (10) : 1642 - 1648
  • [2] Visualization of Convolutional Neural Networks with Attention Mechanism
    Yuan, Meng
    Tie, Bao
    Lin, Dawei
    HUMAN CENTERED COMPUTING, HCC 2021, 2022, 13795 : 82 - 93
  • [3] Probabilistic Attention Map: A Probabilistic Attention Mechanism for Convolutional Neural Networks
    Liu, Yifeng
    Tian, Jing
    SENSORS, 2024, 24 (24)
  • [4] Reducing Test Cases with Attention Mechanism of Neural Networks
    Zhang, Xing
    Chen, Jiongyi
    Feng, Chao
    Li, Ruilin
    Su, Yunfei
    Zhang, Bin
    Lei, Jing
    Tang, Chaojing
    PROCEEDINGS OF THE 30TH USENIX SECURITY SYMPOSIUM, 2021, : 2075 - 2092
  • [5] A Shared Attention Mechanism for Interpretation of Neural Automatic Post-Editing Systems
    Unanue, Inigo Jauregi
    Borzeshi, Ehsan Zare
    Piccardi, Massimo
    NEURAL MACHINE TRANSLATION AND GENERATION, 2018, : 11 - 17
  • [6] Attention mechanism in neural networks: where it comes and where it goes
    Soydaner, Derya
    NEURAL COMPUTING & APPLICATIONS, 2022, 34 (16): : 13371 - 13385
  • [7] Attention mechanism in neural networks: where it comes and where it goes
    Soydaner, Derya
    Neural Computing and Applications, 2022, 34 (16) : 13371 - 13385
  • [8] Attention mechanism in neural networks: where it comes and where it goes
    Derya Soydaner
    Neural Computing and Applications, 2022, 34 : 13371 - 13385
  • [9] Utilizing the Attention Mechanism for Accuracy Prediction in Quantized Neural Networks
    Wei, Lu
    Ma, Zhong
    Yang, Chaojie
    Yao, Qin
    Zheng, Wei
    MATHEMATICS, 2025, 13 (05)
  • [10] A Generalized Attention Mechanism to Enhance the Accuracy Performance of Neural Networks
    Jiang, Pengcheng
    Neri, Ferrante
    Xue, Yu
    Maulik, Ujjwal
    INTERNATIONAL JOURNAL OF NEURAL SYSTEMS, 2024, 34 (12)