Analyzing the Structure of Attention in a Transformer Language Model

被引:0
|
作者
Vig, Jesse [1 ]
Belinkov, Yonatan [2 ,3 ]
机构
[1] Palo Alto Res Ctr, Machine Learning & Data Sci Grp, Interact & Analyt Lab, Palo Alto, CA 94304 USA
[2] Harvard John A Paulson Sch Engn & Appl Sci, Cambridge, MA USA
[3] MIT, Comp Sci & Artificial Intelligence Lab, 77 Massachusetts Ave, Cambridge, MA 02139 USA
来源
BLACKBOXNLP WORKSHOP ON ANALYZING AND INTERPRETING NEURAL NETWORKS FOR NLP AT ACL 2019 | 2019年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The Transformer is a fully attention-based alternative to recurrent networks that has achieved state-of-the-art results across a range of NLP tasks. In this paper, we analyze the structure of attention in a Transformer language model, the GPT-2 small pretrained model. We visualize attention for individual instances and analyze the interaction between attention and syntax over a large corpus. We find that attention targets different parts of speech at different layer depths within the model, and that attention aligns with dependency relations most strongly in the middle layers. We also find that the deepest layers of the model capture the most distant relationships. Finally, we extract exemplar sentences that reveal highly specific patterns targeted by particular attention heads.
引用
收藏
页码:63 / 76
页数:14
相关论文
共 50 条
  • [1] Analyzing Encoded Concepts in Transformer Language Models
    Sajjad, Hassan
    Durrani, Nadir
    Dalvi, Fahim
    Alam, Firoj
    Khan, Abdul Rafae
    Xu, Jia
    NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 3082 - 3101
  • [2] Attention Flows: Analyzing and Comparing Attention Mechanisms in Language Models
    DeRose, Joseph F.
    Wang, Jiayao
    Berger, Matthew
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2021, 27 (02) : 1160 - 1170
  • [3] Retrofitting Structure-aware Transformer Language Model for End Tasks
    Fei, Hao
    Ren, Yafeng
    Ji, Donghong
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 2151 - 2161
  • [4] Transformer Encoder with Protein Language Model for Protein Secondary Structure Prediction
    Kazm, Ammar
    Ali, Aida
    Hashim, Haslina
    ENGINEERING TECHNOLOGY & APPLIED SCIENCE RESEARCH, 2024, 14 (02) : 13124 - 13132
  • [5] Heterogeneous attention based transformer for sign language translation
    Zhang, Hao
    Sun, Yixiang
    Liu, Zenghui
    Liu, Qiyuan
    Liu, Xiyao
    Jiang, Ming
    Schafer, Gerald
    Fang, Hui
    APPLIED SOFT COMPUTING, 2023, 144
  • [6] Attention Analysis and Calibration for Transformer in Natural Language Generation
    Lu, Yu
    Zhang, Jiajun
    Zeng, Jiali
    Wu, Shuangzhi
    Zong, Chengqing
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 1927 - 1938
  • [7] A Multiscale Visualization of Attention in the Transformer Model
    Vig, Jesse
    PROCEEDINGS OF THE 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: SYSTEM DEMONSTRATIONS, (ACL 2019), 2019, : 37 - +
  • [8] Cross-Cultural Language Proficiency Scaling using Transformer and Attention Mechanism Hybrid Model
    Zainal, Anna Gustina
    Misba, M.
    Pathak, Punit
    Patra, Indrajit
    Gopi, Adapa
    El-Ebiary, Yousef A. Baker
    Prema, S.
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2024, 15 (06) : 1144 - 1153
  • [9] A Study on Performance Enhancement by Integrating Neural Topic Attention with Transformer-Based Language Model
    Um, Taehum
    Kim, Namhyoung
    APPLIED SCIENCES-BASEL, 2024, 14 (17):
  • [10] Transformer Gate Attention Model: An Improved Attention Model for Visual Question Answering
    Zhang, Haotian
    Wu, Wei
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,