Residual adaptive sparse hybrid attention transformer for image super resolution

被引:0
|
作者
Huan, Hai [1 ]
Wang, Mingxuan [1 ]
机构
[1] Nanjing Univ Informat Sci & Technol, Sch Artificial Intelligence, Nanjing 210044, Peoples R China
关键词
Image super-resolution; Vision transformer; Hybrid attention; Frequency domain loss; Deep learning; MODEL;
D O I
10.1016/j.engappai.2024.108990
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Image super-resolution is a visual task that reconstructs low-resolution images into high-resolution ones. Currently, many researchers favor applying Transformer-based methods to image super-resolution tasks, which have yielded promising results. However, due to the need to capture long-range dependencies across the entire image, existing research on Vision Transformers (ViT) for super-resolution reconstruction incurs high computational costs, thereby increasing system overhead. Additionally, some researchers have proposed methods based on manually sparse attention mechanisms. However, these approaches, which acquire receptive fields in a manner similar to traditional convolutions, do not fully exploit the advantages of Transformers in extracting global information, resulting in suboptimal reconstruction performance. To leverage the Transformer's ability to capture long-range dependencies, this paper introduces a novel network called RASHAT. In RASHAT, we propose an Adaptive Sparse Hybrid Attention Block (ASHAB). This module introduces a Bi-level Routing Attention(BRA), incorporating both Channel Attention(CA) and Switch Window Multi-head Self-attention((S)W-MSA). These components are designed to capture long-range dependencies, global context, and local dependencies within the image. Additionally, the model employs an Overlapping Cross-Attention Block(OCAB) to enhance information interaction between neighboring pixels. During model training, we introduce a novel composite loss function that combines frequency domain loss with pixel loss, further improving model performance. Extensive experiments demonstrate that benefiting from the sparse attention provided by the Bi-Level Routing Attention (BRA), RASHAT achieves similar performance to the current stateof-the-art results (20.8M) with significantly fewer parameters (11.6M). These results hold across multiple commonly used datasets.
引用
收藏
页数:12
相关论文
共 50 条
  • [21] Hybrid attention transformer with re-parameterized large kernel convolution for image super-resolution
    Ma, Zhicheng
    Liu, Zhaoxiang
    Wang, Kai
    Lian, Shiguo
    IMAGE AND VISION COMPUTING, 2024, 149
  • [22] Cascade residual‑optimized image super‑resolution reconstruction in Transformer network
    Lin, Jianpu
    Wu, Zhencheng
    Wang, Kunfu
    Lin, Zhixian
    Guo, Tailiang
    Lin, Shanling
    Guangxue Jingmi Gongcheng/Optics and Precision Engineering, 2024, 32 (12): : 1902 - 1914
  • [23] A Residual Network with Efficient Transformer for Lightweight Image Super-Resolution
    Yan, Fengqi
    Li, Shaokun
    Zhou, Zhiguo
    Shi, Yonggang
    ELECTRONICS, 2024, 13 (01)
  • [24] Hybrid Function Sparse Representation Towards Image Super Resolution
    Bian, Junyi
    Lin, Baojun
    Zhang, Ke
    COMPUTER ANALYSIS OF IMAGES AND PATTERNS, CAIP 2019, PT II, 2019, 11679 : 27 - 37
  • [25] Edge-Aware Attention Transformer for Image Super-Resolution
    Wang, Haoqian
    Xing, Zhongyang
    Xu, Zhongjie
    Cheng, Xiangai
    Li, Teng
    IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 2905 - 2909
  • [26] Parallel attention recursive generalization transformer for image super-resolution
    Jing Wang
    Yuanyuan Hao
    Hongxing Bai
    Lingyu Yan
    Scientific Reports, 15 (1)
  • [27] Image super-resolution using dilated neighborhood attention transformer
    Chen, Li
    Zuo, Jinnian
    Du, Kai
    Zou, Jinsong
    Yin, Shaoyun
    Wang, Jinyu
    JOURNAL OF ELECTRONIC IMAGING, 2024, 33 (02)
  • [28] Transformer with Hybrid Attention Mechanism for Stereo Endoscopic Video Super Resolution
    Zhang, Tianyi
    Yang, Jie
    SYMMETRY-BASEL, 2023, 15 (10):
  • [29] Image Super-Resolution with Non-Local Sparse Attention
    Mei, Yiqun
    Fan, Yuchen
    Zhou, Yuqian
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 3516 - 3525
  • [30] SRARDA: A lightweight adaptive residual dense attention generative adversarial network for image super-resolution
    Yang, Xin
    Hong, Chaming
    Xia, Tingyu
    Optik, 2024, 315