SSIR: Spatial shuffle multi-head self-attention for Single Image Super-Resolution

被引：14

作者：

Zhao, Liangliang ^{[1
,2
]}

Gao, Junyu ^{[1
,2
,3
]}

Deng, Donghu ^{[1
,2
]}

Li, Xuelong ^{[1
,2
]}

机构：

[1] Northwestern Polytech Univ, Sch Artificial Intelligence OPt & Elect iOPEN, Xian 710072, Shaanxi, Peoples R China

[2] Minist Ind & Informat Technol, Key Lab Intelligent Interact & Applicat, Xian 710072, Shaanxi, Peoples R China

[3] Shanghai Artificial Intelligence Lab, Shanghai 200232, Peoples R China

来源：

PATTERN RECOGNITION | 2024年 / 148卷

关键词：

Single Image Super-Resolution; Long-range attention; Vision transformer;

D O I：

10.1016/j.patcog.2023.110195

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Benefiting from the development of deep convolutional neural networks, CNN-based single-image super-resolution methods have achieved remarkable reconstruction results. However, the limited perceptual field of the convolutional kernel and the use of static weights in the inference process limit the performance of CNN-based methods. Recently, a few vision transformer-based image super-resolution methods have achieved excellent performance compared to CNN-based methods. These methods contain many parameters and require vast amounts of GPU memory for training. In this paper, we propose a spatial shuffle multi-head self-attention for single-image super-resolution that can significantly model long-range pixel dependencies without additional computational consumption. A local perception module is also proposed to combine convolutional neural networks' local connectivity and translational invariance. Reconstruction results on five popular benchmarks show that the proposed method outperforms existing methods in both reconstruction accuracy and visual performance. The proposed method matches the performance of transformed-based methods but requires an inferior number of transformer blocks, which reduces the number of parameters by 40%, GPU memory by 30%, and inference time by 30% compared to transformer-based methods.

引用

页数：12

共 50 条

[21] Lightweight Single Image Super-Resolution With Multi-Scale Spatial Attention Networks
Soh, Jae Woong
Cho, Nam Ik
IEEE ACCESS, 2020, 8 : 35383 - 35391
[22] Neural News Recommendation with Multi-Head Self-Attention
Wu, Chuhan
Wu, Fangzhao
Ge, Suyu
Qi, Tao
Huang, Yongfeng
Xie, Xing
2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 6389 - 6394
[23] Multi-Window Fusion Spatial-Frequency Joint Self-Attention for Remote-Sensing Image Super-Resolution
Li, Ziang
Lu, Wen
Wang, Zhaoyang
Hu, Jian
Zhang, Zeming
He, Lihuo
REMOTE SENSING, 2024, 16 (19)
[24] Image Super-Resolution Reconstruction Method Based on Self-Attention Deep Network
Chen Zihan
Wu Haobo
Pei Haodong
Chen Rong
Hu Jiaxin
Shi Hengtong
LASER & OPTOELECTRONICS PROGRESS, 2021, 58 (04)
[25] Densely Connected Transformer With Linear Self-Attention for Lightweight Image Super-Resolution
Zeng, Kun
Lin, Hanjiang
Yan, Zhiqiang
Fang, Jinsheng
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2023, 72
[26] FNSAM: Image super-resolution using a feedback network with self-attention mechanism
Huang, Yu
Wang, Wenqian
Li, Min
TECHNOLOGY AND HEALTH CARE, 2023, 31 : S383 - S395
[27] Extreme Low Resolution Action Recognition with Spatial-Temporal Multi-Head Self-Attention and Knowledge Distillation
Purwanto, Didik
Pramono, Rizard Renanda Adhi
Chen, Yie-Tarng
Fang, Wen-Hsien
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 961 - 969
[28] Self-attention learning network for face super-resolution
Zeng, Kangli
Wang, Zhongyuan
Lu, Tao
Chen, Jianyu
Wang, Jiaming
Xiong, Zixiang
NEURAL NETWORKS, 2023, 160 : 164 - 174
[29] Multi-scale feature learning network with channel self-attention for remote sensing single-image super-resolution
Wang, Xueqin
Jiang, Wenzong
Zhao, Lifei
Liu, Baodi
Wang, Yanjiang
INTERNATIONAL JOURNAL OF REMOTE SENSING, 2022, 43 (18) : 6669 - 6688
[30] Masked multi-head self-attention for causal speech enhancement
Nicolson, Aaron
Paliwal, Kuldip K.
SPEECH COMMUNICATION, 2020, 125 : 80 - 96

← 1 2 3 4 5 →