A generic shared attention mechanism for various backbone neural networks

被引：1

作者：

Huang, Zhongzhan ^{[1
]}

Liang, Senwei ^{[2
]}

Liang, Mingfu ^{[3
]}

机构：

[1] Sun Yat Sen Univ, Guangzhou 510275, Peoples R China

[2] Purdue Univ, W Lafayette, IN 47906 USA

[3] Northwestern Univ, Evanston, IL 60201 USA

来源：

NEUROCOMPUTING | 2025年 / 611卷

基金：

中国国家自然科学基金;

关键词：

Layer-wise shared attention mechanism; Parameter sharing; Dense-and-implicit connection; Stable training;

D O I：

10.1016/j.neucom.2024.128697

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The self-attention mechanism is crucial for enhancing various backbone neural networks' performance. However, current methods add self-attention modules (SAMs) to each network layer without fully utilizing their potential, resulting in suboptimal performance and higher parameter consumption as network depth increases. In this paper, we reveal an inherent phenomenon: SAMs produce highly correlated attention maps across layers, with an average Pearson correlation coefficient of 0.85. Inspired by this inherent observation, we propose Dense-and-Implicit Attention (DIA), which shares SAMs across layers and uses a long short-term memory module to calibrate and connect these correlated attention maps, improving parameter efficiency. This approach aligns with the neural network's dynamic system perspective. Extensive experiments show DIA consistently enhances various backbones like ResNet, Transformer, and UNet in tasks such as image classification, object detection, and image generation with diffusion models. Our analysis indicates that DIA's effectiveness stems from its dense inter-layer information connections, absent in conventional mechanisms, stabilizing training and providing regularization effects. This paper's insights advance our understanding of attention mechanisms, optimizing them and paving the way for future developments across diverse neural networks.

引用

页数：14

共 50 条

[1] Central Attention Mechanism for Convolutional Neural Networks
Geng, Y.X.
Wang, L.
Wang, Z.Y.
Wang, Y.G.
IAENG International Journal of Computer Science, 2024, 51 (10) : 1642 - 1648
[2] Visualization of Convolutional Neural Networks with Attention Mechanism
Yuan, Meng
Tie, Bao
Lin, Dawei
HUMAN CENTERED COMPUTING, HCC 2021, 2022, 13795 : 82 - 93
[3] Probabilistic Attention Map: A Probabilistic Attention Mechanism for Convolutional Neural Networks
Liu, Yifeng
Tian, Jing
SENSORS, 2024, 24 (24)
[4] Reducing Test Cases with Attention Mechanism of Neural Networks
Zhang, Xing
Chen, Jiongyi
Feng, Chao
Li, Ruilin
Su, Yunfei
Zhang, Bin
Lei, Jing
Tang, Chaojing
PROCEEDINGS OF THE 30TH USENIX SECURITY SYMPOSIUM, 2021, : 2075 - 2092
[5] A Shared Attention Mechanism for Interpretation of Neural Automatic Post-Editing Systems
Unanue, Inigo Jauregi
Borzeshi, Ehsan Zare
Piccardi, Massimo
NEURAL MACHINE TRANSLATION AND GENERATION, 2018, : 11 - 17
[6] Attention mechanism in neural networks: where it comes and where it goes
Soydaner, Derya
NEURAL COMPUTING & APPLICATIONS, 2022, 34 (16): : 13371 - 13385
[7] Attention mechanism in neural networks: where it comes and where it goes
Soydaner, Derya
Neural Computing and Applications, 2022, 34 (16) : 13371 - 13385
[8] Attention mechanism in neural networks: where it comes and where it goes
Derya Soydaner
Neural Computing and Applications, 2022, 34 : 13371 - 13385
[9] Utilizing the Attention Mechanism for Accuracy Prediction in Quantized Neural Networks
Wei, Lu
Ma, Zhong
Yang, Chaojie
Yao, Qin
Zheng, Wei
MATHEMATICS, 2025, 13 (05)
[10] A Generalized Attention Mechanism to Enhance the Accuracy Performance of Neural Networks
Jiang, Pengcheng
Neri, Ferrante
Xue, Yu
Maulik, Ujjwal
INTERNATIONAL JOURNAL OF NEURAL SYSTEMS, 2024, 34 (12)

← 1 2 3 4 5 →