Mask Attention Networks: Rethinking and Strengthen Transformer

被引:0
|
作者
Fan, Zhihao [1 ]
Gong, Yeyun [2 ]
Lit, Dayiheng [3 ]
Wei, Zhongyu [1 ,6 ]
Wang, Siyuan [1 ]
Jiao, Jian [4 ]
Duan, Nan [2 ]
Zhang, Ruofei [4 ]
Huang, Xuanjing [5 ]
机构
[1] Fudan Univ, Sch Data Sci, Shanghai, Peoples R China
[2] Microsoft Res Asia, Beijing, Peoples R China
[3] DAMO Acad, Hangzhou, Peoples R China
[4] Microsoft, Beijing, Peoples R China
[5] Fudan Univ, Sch Comp Sci, Shanghai, Peoples R China
[6] Fudan Univ, Res Inst Intelligent & Complex Syst, Shanghai, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Transformer is an attention-based neural network, which consists of two sublayers, namely, Self-Attention Network (SAN) and FeedForward Network (FFN). Existing research explores to enhance the two sublayers separately to improve the capability of Transformer for text representation. In this paper, we present a novel understanding of SAN and FFN as Mask Attention Networks (MANs) and show that they are two special cases of MANs with static mask matrices. However, their static mask matrices limit the capability for localness modeling in text representation learning. We therefore introduce a new layer named dynamic mask attention network (DMAN) with a learnable mask matrix which is able to model localness adaptively. To incorporate advantages of DMAN, SAN, and FFN, we propose a sequential layered structure to combine the three types of layers. Extensive experiments on various tasks, including neural machine translation and text summarization demonstrate that our model outperforms the original Transformer.
引用
收藏
页码:1692 / 1701
页数:10
相关论文
共 50 条
  • [1] Bayesian Transformer Using Disentangled Mask Attention
    Chien, Jen-Tzung
    Huang, Yu-Han
    INTERSPEECH 2022, 2022, : 1761 - 1765
  • [2] Couplformer: Rethinking Vision Transformer with Coupling Attention
    Lan, Hai
    Wang, Xihao
    Shen, Hao
    Liang, Peidong
    Wei, Xian
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 6464 - 6473
  • [3] Synthesizer: Rethinking Self-Attention for Transformer Models
    Tay, Yi
    Bahri, Dara
    Metzler, Donald
    Juan, Da-Cheng
    Zhao, Zhe
    Zheng, Che
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139 : 7192 - 7203
  • [4] Masked-attention Mask Transformer for Universal Image Segmentation
    Cheng, Bowen
    Misra, Ishan
    Schwing, Alexander G.
    Kirillov, Alexander
    Girdhar, Rohit
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 1280 - 1289
  • [5] Structural Attention: Rethinking Transformer for Unpaired Medical Image Synthesis
    Vu Minh Hieu Phan
    Xie, Yutong
    Zhang, Bowen
    Qi, Yuankai
    Liao, Zhibin
    Perperidis, Antonios
    Phung, Son Lam
    Verjans, Johan W.
    To, Minh-Son
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT VII, 2024, 15007 : 690 - 700
  • [6] Mask Attention-SRGAN for Mobile Sensing Networks
    Huang, Chi-En
    Chang, Ching-Chun
    Li, Yung-Hui
    SENSORS, 2021, 21 (17)
  • [7] Mask-Attention-Free Transformer for 3D Instance Segmentation
    Lai, Xin
    Yuan, Yuhui
    Chu, Ruihang
    Chen, Yukang
    Hu, Han
    Jia, Jiaya
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 3670 - 3680
  • [8] Universal Graph Transformer Self-Attention Networks
    Dai Quoc Nguyen
    Tu Dinh Nguyen
    Dinh Phung
    COMPANION PROCEEDINGS OF THE WEB CONFERENCE 2022, WWW 2022 COMPANION, 2022, : 193 - 196
  • [9] Multi-Agent Transformer Networks With Graph Attention
    Jin, Woobeen
    Lee, Hyukjoon
    IEEE ACCESS, 2024, 12 : 144982 - 144991
  • [10] Hierarchical Attention Transformer Networks for Long Document Classification
    Hu, Yongli
    Chen, Puman
    Liu, Tengfei
    Gao, Junbin
    Sun, Yanfeng
    Yin, Baocai
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,