Analyzing the Structure of Attention in a Transformer Language Model

被引：0

作者：

Vig, Jesse ^{[1
]}

Belinkov, Yonatan ^{[2
,3
]}

机构：

[1] Palo Alto Res Ctr, Machine Learning & Data Sci Grp, Interact & Analyt Lab, Palo Alto, CA 94304 USA

[2] Harvard John A Paulson Sch Engn & Appl Sci, Cambridge, MA USA

[3] MIT, Comp Sci & Artificial Intelligence Lab, 77 Massachusetts Ave, Cambridge, MA 02139 USA

来源：

BLACKBOXNLP WORKSHOP ON ANALYZING AND INTERPRETING NEURAL NETWORKS FOR NLP AT ACL 2019 | 2019年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The Transformer is a fully attention-based alternative to recurrent networks that has achieved state-of-the-art results across a range of NLP tasks. In this paper, we analyze the structure of attention in a Transformer language model, the GPT-2 small pretrained model. We visualize attention for individual instances and analyze the interaction between attention and syntax over a large corpus. We find that attention targets different parts of speech at different layer depths within the model, and that attention aligns with dependency relations most strongly in the middle layers. We also find that the deepest layers of the model capture the most distant relationships. Finally, we extract exemplar sentences that reveal highly specific patterns targeted by particular attention heads.

引用

页码：63 / 76

页数：14

共 50 条

[1] Analyzing Encoded Concepts in Transformer Language Models
Sajjad, Hassan
Durrani, Nadir
Dalvi, Fahim
Alam, Firoj
Khan, Abdul Rafae
Xu, Jia
NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 3082 - 3101
[2] Attention Flows: Analyzing and Comparing Attention Mechanisms in Language Models
DeRose, Joseph F.
Wang, Jiayao
Berger, Matthew
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2021, 27 (02) : 1160 - 1170
[3] Retrofitting Structure-aware Transformer Language Model for End Tasks
Fei, Hao
Ren, Yafeng
Ji, Donghong
PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 2151 - 2161
[4] Transformer Encoder with Protein Language Model for Protein Secondary Structure Prediction
Kazm, Ammar
Ali, Aida
Hashim, Haslina
ENGINEERING TECHNOLOGY & APPLIED SCIENCE RESEARCH, 2024, 14 (02) : 13124 - 13132
[5] Heterogeneous attention based transformer for sign language translation
Zhang, Hao
Sun, Yixiang
Liu, Zenghui
Liu, Qiyuan
Liu, Xiyao
Jiang, Ming
Schafer, Gerald
Fang, Hui
APPLIED SOFT COMPUTING, 2023, 144
[6] Attention Analysis and Calibration for Transformer in Natural Language Generation
Lu, Yu
Zhang, Jiajun
Zeng, Jiali
Wu, Shuangzhi
Zong, Chengqing
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 1927 - 1938
[7] A Multiscale Visualization of Attention in the Transformer Model
Vig, Jesse
PROCEEDINGS OF THE 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: SYSTEM DEMONSTRATIONS, (ACL 2019), 2019, : 37 - +
[8] Cross-Cultural Language Proficiency Scaling using Transformer and Attention Mechanism Hybrid Model
Zainal, Anna Gustina
Misba, M.
Pathak, Punit
Patra, Indrajit
Gopi, Adapa
El-Ebiary, Yousef A. Baker
Prema, S.
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2024, 15 (06) : 1144 - 1153
[9] A Study on Performance Enhancement by Integrating Neural Topic Attention with Transformer-Based Language Model
Um, Taehum
Kim, Namhyoung
APPLIED SCIENCES-BASEL, 2024, 14 (17):
[10] Transformer Gate Attention Model: An Improved Attention Model for Visual Question Answering
Zhang, Haotian
Wu, Wei
2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,

← 1 2 3 4 5 →