Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention

被引：0

作者：

Katharopoulos, Angelos ^{[1
,2
]}

Vyas, Apoorv ^{[1
,2
]}

Pappas, Nikolaos ^{[3
]}

Fleuret, Francois ^{[2
,4
]}

机构：

[1] Idiap Res Inst, Martigny, Switzerland

[2] Ecole Polytech Fed Lausanne, Lausanne, Switzerland

[3] Univ Washington, Seattle, WA 98195 USA

[4] Univ Geneva, Geneva, Switzerland

来源：

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119 | 2020年 / 119卷

基金：

瑞士国家科学基金会;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Transformers achieve remarkable performance in several tasks but due to their quadratic complexity, with respect to the input's length, they are prohibitively slow for very long sequences. To address this limitation, we express the self-attention as a linear dot-product of kernel feature maps and make use of the associativity property of matrix products to reduce the complexity from O (N-2) to O (N), where N is the sequence length. We show that this formulation permits an iterative implementation that dramatically accelerates autoregressive transformers and reveals their relationship to recurrent neural networks. Our linear transformers achieve similar performance to vanilla transformers and they are up to 4000x faster on autoregressive prediction of very long sequences.

引用

页数：10

共 50 条

[1] GAPFORMER: Fast Autoregressive Transformers meet RNNs for Personalized Adaptive Cruise Control
Sachdeva, Noveen
Wang, Ziran
Han, Kyungtae
Gupta, Rohit
McAuley, Julian
2022 IEEE 25TH INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS (ITSC), 2022, : 2528 - 2535
[2] Finetuning Pretrained Transformers into RNNs
Kasai, Jungo
Peng, Hao
Zhang, Yizhe
Yogatama, Dani
Ilharco, Gabriel
Pappas, Nikolaos
Mao, Yi
Chen, Weizhu
Smith, Noah A.
2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 10630 - 10643
[3] Fast Vision Transformers with HiLo Attention
Pan, Zizheng
Cai, Jianfei
Zhuang, Bohan
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
[4] Linear Transformers Are Secretly Fast Weight Programmers
Schlag, Imanol
Irie, Kazuki
Schmidhuber, Jurgen
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[5] Fast transformers
Matthias Wuttig
Martin Salinga
Nature Materials, 2012, 11 (4) : 270 - 271
[6] Pose Transformers (POTR): Human Motion Prediction with Non-Autoregressive Transformers
Martinez-Gonzalez, Angel
Villamizar, Michael
Odobez, Jean-Marc
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 2276 - 2284
[7] On the Learning of Non-Autoregressive Transformers
Huang, Fei
Tao, Tianhua
Zhou, Hao
Li, Lei
Huang, Minlie
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[8] LeaPformer: Enabling Linear Transformers for Autoregressive and Simultaneous Tasks via Learned Proportions
Agostinelli, Victor
Hong, Sanghyun
Chen, Lizhong
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, 2024, 235 : 452 - 470
[9] Training-free Neural Architecture Search for RNNs and Transformers
Serianni, Aaron
Kalita, Jugal
PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 2522 - 2540
[10] Coding Small Group Communication with AI: RNNs and Transformers with Context
Pilny, Andrew
Bonito, Joseph
Schecter, Aaron
SMALL GROUP RESEARCH, 2025,

← 1 2 3 4 5 →