Tree Transformer: Integrating Tree Structures into Self-Attention

被引：0

作者：

Wang, Yau-Shian ^{[1
]}

Lee, Hung-Yi ^{[1
]}

Chen, Yun-Nung ^{[1
]}

机构：

[1] Natl Taiwan Univ, Taipei, Taiwan

来源：

2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE | 2019年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Pre-training Transformer from large-scale raw texts and fine-tuning on the desired task have achieved state-of-the-art results on diverse NLP tasks. However, it is unclear what the learned attention captures. The attention computed by attention heads seems not to match human intuitions about hierarchical structures. This paper proposes Tree Transformer, which adds an extra constraint to attention heads of the bidirectional Transformer encoder in order to encourage the attention heads to follow tree structures. The tree structures can be automatically induced from raw texts by our proposed "Constituent Attention" module, which is simply implemented by self-attention between two adjacent words. With the same training procedure identical to BERT, the experiments demonstrate the effectiveness of Tree Transformer in terms of inducing tree structures, better language modeling, and further learning more explainable attention scores(1).

引用

页码：1061 / 1070

页数：10

共 50 条

[31] Singularformer: Learning to Decompose Self-Attention to Linearize the Complexity of Transformer
Wu, Yifan
Kan, Shichao
Zeng, Min
Li, Min
PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 4433 - 4441
[32] Nucleic Transformer: Classifying DNA Sequences with Self-Attention and Convolutions
He, Shujun
Gao, Baizhen
Sabnis, Rushant
Sun, Qing
ACS SYNTHETIC BIOLOGY, 2023, 12 (11): : 3205 - 3214
[33] ET: Re -Thinking Self-Attention for Transformer Models on GPUs
Chen, Shiyang
Huang, Shaoyi
Pandey, Santosh
Li, Bingbing
Gao, Guang R.
Zheng, Long
Ding, Caiwen
Liu, Hang
SC21: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2021,
[34] Top-k Self-Attention in Transformer for Video Inpainting
Li, Guanxiao
Zhang, Ke
Su, Yu
Wang, JingYu
2024 5TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND APPLICATION, ICCEA 2024, 2024, : 1038 - 1042
[35] Additional Self-Attention Transformer With Adapter for Thick Haze Removal
Cai, Zhenyang
Ning, Jin
Ding, Zhiheng
Duo, Bin
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21 : 1 - 5
[36] Transformer Self-Attention Change Detection Network with Frozen Parameters
Cheng, Peiyang
Xia, Min
Wang, Dehao
Lin, Haifeng
Zhao, Zikai
APPLIED SCIENCES-BASEL, 2025, 15 (06):
[37] ABVS breast tumour segmentation via integrating CNN with dilated sampling self-attention and feature interaction Transformer
Liu, Yiyao
Li, Jinyao
Yang, Yi
Zhao, Cheng
Zhang, Yongtao
Yang, Peng
Dong, Lei
Deng, Xiaofei
Zhu, Ting
Wang, Tianfu
Jiang, Wei
Lei, Baiying
NEURAL NETWORKS, 2025, 187
[38] Lightweight Vision Transformer with Spatial and Channel Enhanced Self-Attention
Zheng, Jiahao
Yang, Longqi
Li, Yiying
Yang, Ke
Wang, Zhiyuan
Zhou, Jun
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, : 1484 - 1488
[39] Spectral Superresolution Using Transformer with Convolutional Spectral Self-Attention
Liao, Xiaomei
He, Lirong
Mao, Jiayou
Xu, Meng
REMOTE SENSING, 2024, 16 (10)
[40] CMAT: Integrating Convolution Mixer and Self-Attention for Visual Tracking
Wang, Jun
Yin, Peng
Wang, Yuanyun
Yang, Wenhui
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 326 - 338

← 1 2 3 4 5 →