RealFormer: Transformer Likes Residual Attention

被引:0
|
作者
He, Ruining [1 ]
Ravula, Anirudh [1 ]
Kanagal, Bhargav [1 ]
Ainslie, Joshua [1 ]
机构
[1] Google Res, Mountain View, CA 94043 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Transformer is the backbone of modern NLP models. In this paper, we propose Real-Former, a simple and generic technique to create Residual Attention Layer Transformer networks that significantly outperform the canonical Transformer and its variants (BERT, ETC, etc.) on a wide spectrum of tasks including Masked Language Modeling, GLUE, SQuAD, Neural Machine Translation, WikiHop, HotpotQA, Natural Questions, and OpenKP. We also observe empirically that RealFormer stabilizes training and leads to models with sparser attention. Source code and pre-trained checkpoints for RealFormer can be found at https://github.com/google-research/google-research/ tree/master/realformer.
引用
收藏
页码:929 / 943
页数:15
相关论文
共 50 条
  • [1] Residual Swin Transformer Channel Attention Network for Image Demosaicing
    Xing, Wenzhu
    Egiazarian, Karen
    2022 10TH EUROPEAN WORKSHOP ON VISUAL INFORMATION PROCESSING (EUVIP), 2022,
  • [2] Residual adaptive sparse hybrid attention transformer for image super resolution
    Huan, Hai
    Wang, Mingxuan
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 136
  • [3] Dilated transformer: residual axial attention for breast ultrasound image segmentation
    Shen, Xiaoyan
    Wang, Liangyu
    Zhao, Yu
    Liu, Ruibo
    Qian, Wei
    Ma, He
    QUANTITATIVE IMAGING IN MEDICINE AND SURGERY, 2022, : 4512 - 4528
  • [4] GRAformer: A gated residual attention transformer for multivariate time series forecasting
    Yang, Chengcao
    Wang, Yutian
    Yang, Bing
    Chen, Jun
    NEUROCOMPUTING, 2024, 581
  • [5] Image denoising using channel attention residual enhanced Swin Transformer
    Qiang Dai
    Xi Cheng
    Li Zhang
    Multimedia Tools and Applications, 2024, 83 : 19041 - 19059
  • [6] Image denoising using channel attention residual enhanced Swin Transformer
    Dai, Qiang
    Cheng, Xi
    Zhang, Li
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (07) : 19041 - 19059
  • [7] CRAformer: A cross-residual attention transformer for solar irradiation multistep forecasting
    Zhang, Zongbin
    Huang, Xiaoqiao
    Li, Chengli
    Cheng, Feiyan
    Tai, Yonghang
    ENERGY, 2025, 320
  • [8] TransRA: transformer and residual attention fusion for single remote sensing image dehazing
    Pengwei Dong
    Bo Wang
    Multidimensional Systems and Signal Processing, 2022, 33 : 1119 - 1138
  • [9] RDTNet: A residual deformable attention based transformer network for breast cancer classification
    Babita
    Nayak, Deepak Ranjan
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 249
  • [10] TransRA: transformer and residual attention fusion for single remote sensing image dehazing
    Dong, Pengwei
    Wang, Bo
    MULTIDIMENSIONAL SYSTEMS AND SIGNAL PROCESSING, 2022, 33 (04) : 1119 - 1138