The Case for Translation-Invariant Self-Attention in Transformer-Based Language Models

被引：0

作者：

Wennberg, Ulme ^{[1
]}

Henter, Gustav Eje ^{[1
]}

机构：

[1] KTH Royal Inst Technol, Div Speech Mus & Hearing, Stockholm, Sweden

来源：

ACL-IJCNLP 2021: THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 2 | 2021年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Mechanisms for encoding positional information are central for transformer-based language models. In this paper, we analyze the position embeddings of existing language models, finding strong evidence of translation invariance, both for the embeddings themselves and for their effect on self-attention. The degree of translation invariance increases during training and correlates positively with model performance. Our findings lead us to propose translation-invariant self-attention (TISA), which accounts for the relative position between tokens in an interpretable fashion without needing conventional position embeddings. Our proposal has several theoretical advantages over existing position-representation approaches. Experiments show that it improves on regular ALBERT on GLUE tasks, while only adding orders of magnitude less positional parameters.

引用

页码：130 / 140

页数：11

共 50 条

[21] Ouroboros: On Accelerating Training of Transformer-Based Language Models
Yang, Qian
Huo, Zhouyuan
Wang, Wenlin
Huang, Heng
Carin, Lawrence
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[22] Transformer-Based Language Models for Software Vulnerability Detection
Thapa, Chandra
Jang, Seung Ick
Ahmed, Muhammad Ejaz
Camtepe, Seyit
Pieprzyk, Josef
Nepal, Surya
PROCEEDINGS OF THE 38TH ANNUAL COMPUTER SECURITY APPLICATIONS CONFERENCE, ACSAC 2022, 2022, : 481 - 496
[23] A Comparison of Transformer-Based Language Models on NLP Benchmarks
Greco, Candida Maria
Tagarelli, Andrea
Zumpano, Ester
NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS (NLDB 2022), 2022, 13286 : 490 - 501
[24] RadBERT: Adapting Transformer-based Language Models to Radiology
Yan, An
McAuley, Julian
Lu, Xing
Du, Jiang
Chang, Eric Y.
Gentili, Amilcare
Hsu, Chun-Nan
RADIOLOGY-ARTIFICIAL INTELLIGENCE, 2022, 4 (04)
[25] Relative molecule self-attention transformer
Łukasz Maziarka
Dawid Majchrowski
Tomasz Danel
Piotr Gaiński
Jacek Tabor
Igor Podolak
Paweł Morkisz
Stanisław Jastrzębski
Journal of Cheminformatics, 16
[26] Applications of transformer-based language models in bioinformatics: a survey
Zhang, Shuang
Fan, Rui
Liu, Yuti
Chen, Shuang
Liu, Qiao
Zeng, Wanwen
NEURO-ONCOLOGY ADVANCES, 2023, 5 (01)
[27] TAG: Gradient Attack on Transformer-based Language Models
Deng, Jieren
Wang, Yijue
Li, Ji
Wang, Chenghong
Shang, Chao
Liu, Hang
Rajasekaran, Sanguthevar
Ding, Caiwen
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 3600 - 3610
[28] Relative molecule self-attention transformer
Maziarka, Lukasz
Majchrowski, Dawid
Danel, Tomasz
Gainski, Piotr
Tabor, Jacek
Podolak, Igor
Morkisz, Pawel
Jastrzebski, Stanislaw
JOURNAL OF CHEMINFORMATICS, 2024, 16 (01)
[29] CSP-Former: A Transformer-Based Network for Point Cloud Analysis with Compressed Sensing and Spatial Self-Attention
Zhong, Jiandan
Jiang, Hongyu
Ji, Yulin
Li, Yingxiang
Xue, Yajuan
ELECTRONICS, 2025, 14 (02):
[30] Heterogeneous attention based transformer for sign language translation
Zhang, Hao
Sun, Yixiang
Liu, Zenghui
Liu, Qiyuan
Liu, Xiyao
Jiang, Ming
Schafer, Gerald
Fang, Hui
APPLIED SOFT COMPUTING, 2023, 144

← 1 2 3 4 5 →