Mask Attention Networks: Rethinking and Strengthen Transformer

被引:0
|
作者
Fan, Zhihao [1 ]
Gong, Yeyun [2 ]
Lit, Dayiheng [3 ]
Wei, Zhongyu [1 ,6 ]
Wang, Siyuan [1 ]
Jiao, Jian [4 ]
Duan, Nan [2 ]
Zhang, Ruofei [4 ]
Huang, Xuanjing [5 ]
机构
[1] Fudan Univ, Sch Data Sci, Shanghai, Peoples R China
[2] Microsoft Res Asia, Beijing, Peoples R China
[3] DAMO Acad, Hangzhou, Peoples R China
[4] Microsoft, Beijing, Peoples R China
[5] Fudan Univ, Sch Comp Sci, Shanghai, Peoples R China
[6] Fudan Univ, Res Inst Intelligent & Complex Syst, Shanghai, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Transformer is an attention-based neural network, which consists of two sublayers, namely, Self-Attention Network (SAN) and FeedForward Network (FFN). Existing research explores to enhance the two sublayers separately to improve the capability of Transformer for text representation. In this paper, we present a novel understanding of SAN and FFN as Mask Attention Networks (MANs) and show that they are two special cases of MANs with static mask matrices. However, their static mask matrices limit the capability for localness modeling in text representation learning. We therefore introduce a new layer named dynamic mask attention network (DMAN) with a learnable mask matrix which is able to model localness adaptively. To incorporate advantages of DMAN, SAN, and FFN, we propose a sequential layered structure to combine the three types of layers. Extensive experiments on various tasks, including neural machine translation and text summarization demonstrate that our model outperforms the original Transformer.
引用
收藏
页码:1692 / 1701
页数:10
相关论文
共 50 条
  • [31] Rethinking CNN Architectures in Transformer Detectors
    Pan, Mengze
    Tian, Kai
    Liao, Qingmin
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PART X, 2023, 14263 : 382 - 393
  • [32] Rethinking Traffic Speed Prediction with Traffic Flow-Aware Graph Attention Networks
    Ham, Seung Woo
    Kim, Dong-Kyu
    IEEE Conference on Intelligent Transportation Systems, Proceedings, ITSC, 2023, : 4770 - 4775
  • [33] Rethinking Traffic Speed Prediction with Traffic Flow-Aware Graph Attention Networks
    Ham, Seung Woo
    Kim, Dong-Kyu
    2023 IEEE 26TH INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS, ITSC, 2023, : 4770 - 4775
  • [34] AiATrack: Attention in Attention for Transformer Visual Tracking
    Gao, Shenyuan
    Zhou, Chunluan
    Ma, Chao
    Wang, Xinggang
    Yuan, Junsong
    COMPUTER VISION, ECCV 2022, PT XXII, 2022, 13682 : 146 - 164
  • [35] Rethinking Attention: Exploring Shallow Feed-Forward Neural Networks as an Alternative to Attention Layers in Transformers (Student Abstract)
    Dordevic, Danilo
    Bozic, Vukasin
    Thommes, Joseph
    Coppola, Daniele
    Singh, Sidak Pal
    THIRTY-EIGTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 21, 2024, : 23477 - 23479
  • [36] Enhancing Low-Light Images with Kolmogorov-Arnold Networks in Transformer Attention
    Brateanu, Alexandru
    Balmez, Raul
    Orhei, Ciprian
    Ancuti, Cosmin
    Ancuti, Codruta
    SENSORS, 2025, 25 (02)
  • [37] Supervised Spatial Transformer Networks for Attention Learning in Fine-grained Action Recognition
    Liu, Dichao
    Wang, Yu
    Kato, Jien
    VISAPP: PROCEEDINGS OF THE 14TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS, VOL 4, 2019, : 311 - 318
  • [38] Attention-Guided Spatial Transformer Networks for Fine-Grained Visual Recognition
    Liu, Dichao
    Wang, Yu
    Kato, Jien
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2019, E102D (12) : 2577 - 2586
  • [39] A study of attention information from transformer layers in hybrid medical image segmentation networks
    Hasany, Syed Nouman
    Petitjean, Caroline
    Meriaudeau, Fabrice
    MEDICAL IMAGING 2023, 2023, 12464
  • [40] Transformer-based multi-attention hybrid networks for skin lesion segmentation
    Dong, Zhiwei
    Li, Jinjiang
    Hua, Zhen
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 244