Tree Transformer: Integrating Tree Structures into Self-Attention

被引:0
|
作者
Wang, Yau-Shian [1 ]
Lee, Hung-Yi [1 ]
Chen, Yun-Nung [1 ]
机构
[1] Natl Taiwan Univ, Taipei, Taiwan
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Pre-training Transformer from large-scale raw texts and fine-tuning on the desired task have achieved state-of-the-art results on diverse NLP tasks. However, it is unclear what the learned attention captures. The attention computed by attention heads seems not to match human intuitions about hierarchical structures. This paper proposes Tree Transformer, which adds an extra constraint to attention heads of the bidirectional Transformer encoder in order to encourage the attention heads to follow tree structures. The tree structures can be automatically induced from raw texts by our proposed "Constituent Attention" module, which is simply implemented by self-attention between two adjacent words. With the same training procedure identical to BERT, the experiments demonstrate the effectiveness of Tree Transformer in terms of inducing tree structures, better language modeling, and further learning more explainable attention scores(1).
引用
收藏
页码:1061 / 1070
页数:10
相关论文
共 50 条
  • [31] Singularformer: Learning to Decompose Self-Attention to Linearize the Complexity of Transformer
    Wu, Yifan
    Kan, Shichao
    Zeng, Min
    Li, Min
    PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 4433 - 4441
  • [32] Nucleic Transformer: Classifying DNA Sequences with Self-Attention and Convolutions
    He, Shujun
    Gao, Baizhen
    Sabnis, Rushant
    Sun, Qing
    ACS SYNTHETIC BIOLOGY, 2023, 12 (11): : 3205 - 3214
  • [33] ET: Re -Thinking Self-Attention for Transformer Models on GPUs
    Chen, Shiyang
    Huang, Shaoyi
    Pandey, Santosh
    Li, Bingbing
    Gao, Guang R.
    Zheng, Long
    Ding, Caiwen
    Liu, Hang
    SC21: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2021,
  • [34] Top-k Self-Attention in Transformer for Video Inpainting
    Li, Guanxiao
    Zhang, Ke
    Su, Yu
    Wang, JingYu
    2024 5TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND APPLICATION, ICCEA 2024, 2024, : 1038 - 1042
  • [35] Additional Self-Attention Transformer With Adapter for Thick Haze Removal
    Cai, Zhenyang
    Ning, Jin
    Ding, Zhiheng
    Duo, Bin
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21 : 1 - 5
  • [36] Transformer Self-Attention Change Detection Network with Frozen Parameters
    Cheng, Peiyang
    Xia, Min
    Wang, Dehao
    Lin, Haifeng
    Zhao, Zikai
    APPLIED SCIENCES-BASEL, 2025, 15 (06):
  • [37] ABVS breast tumour segmentation via integrating CNN with dilated sampling self-attention and feature interaction Transformer
    Liu, Yiyao
    Li, Jinyao
    Yang, Yi
    Zhao, Cheng
    Zhang, Yongtao
    Yang, Peng
    Dong, Lei
    Deng, Xiaofei
    Zhu, Ting
    Wang, Tianfu
    Jiang, Wei
    Lei, Baiying
    NEURAL NETWORKS, 2025, 187
  • [38] Lightweight Vision Transformer with Spatial and Channel Enhanced Self-Attention
    Zheng, Jiahao
    Yang, Longqi
    Li, Yiying
    Yang, Ke
    Wang, Zhiyuan
    Zhou, Jun
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, : 1484 - 1488
  • [39] Spectral Superresolution Using Transformer with Convolutional Spectral Self-Attention
    Liao, Xiaomei
    He, Lirong
    Mao, Jiayou
    Xu, Meng
    REMOTE SENSING, 2024, 16 (10)
  • [40] CMAT: Integrating Convolution Mixer and Self-Attention for Visual Tracking
    Wang, Jun
    Yin, Peng
    Wang, Yuanyun
    Yang, Wenhui
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 326 - 338