Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention

被引:0
|
作者
Katharopoulos, Angelos [1 ,2 ]
Vyas, Apoorv [1 ,2 ]
Pappas, Nikolaos [3 ]
Fleuret, Francois [2 ,4 ]
机构
[1] Idiap Res Inst, Martigny, Switzerland
[2] Ecole Polytech Fed Lausanne, Lausanne, Switzerland
[3] Univ Washington, Seattle, WA 98195 USA
[4] Univ Geneva, Geneva, Switzerland
基金
瑞士国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Transformers achieve remarkable performance in several tasks but due to their quadratic complexity, with respect to the input's length, they are prohibitively slow for very long sequences. To address this limitation, we express the self-attention as a linear dot-product of kernel feature maps and make use of the associativity property of matrix products to reduce the complexity from O (N-2) to O (N), where N is the sequence length. We show that this formulation permits an iterative implementation that dramatically accelerates autoregressive transformers and reveals their relationship to recurrent neural networks. Our linear transformers achieve similar performance to vanilla transformers and they are up to 4000x faster on autoregressive prediction of very long sequences.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] GAPFORMER: Fast Autoregressive Transformers meet RNNs for Personalized Adaptive Cruise Control
    Sachdeva, Noveen
    Wang, Ziran
    Han, Kyungtae
    Gupta, Rohit
    McAuley, Julian
    2022 IEEE 25TH INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS (ITSC), 2022, : 2528 - 2535
  • [2] Finetuning Pretrained Transformers into RNNs
    Kasai, Jungo
    Peng, Hao
    Zhang, Yizhe
    Yogatama, Dani
    Ilharco, Gabriel
    Pappas, Nikolaos
    Mao, Yi
    Chen, Weizhu
    Smith, Noah A.
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 10630 - 10643
  • [3] Fast Vision Transformers with HiLo Attention
    Pan, Zizheng
    Cai, Jianfei
    Zhuang, Bohan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [4] Linear Transformers Are Secretly Fast Weight Programmers
    Schlag, Imanol
    Irie, Kazuki
    Schmidhuber, Jurgen
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [5] Fast transformers
    Matthias Wuttig
    Martin Salinga
    Nature Materials, 2012, 11 (4) : 270 - 271
  • [6] Pose Transformers (POTR): Human Motion Prediction with Non-Autoregressive Transformers
    Martinez-Gonzalez, Angel
    Villamizar, Michael
    Odobez, Jean-Marc
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 2276 - 2284
  • [7] On the Learning of Non-Autoregressive Transformers
    Huang, Fei
    Tao, Tianhua
    Zhou, Hao
    Li, Lei
    Huang, Minlie
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [8] LeaPformer: Enabling Linear Transformers for Autoregressive and Simultaneous Tasks via Learned Proportions
    Agostinelli, Victor
    Hong, Sanghyun
    Chen, Lizhong
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, 2024, 235 : 452 - 470
  • [9] Training-free Neural Architecture Search for RNNs and Transformers
    Serianni, Aaron
    Kalita, Jugal
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 2522 - 2540
  • [10] Coding Small Group Communication with AI: RNNs and Transformers with Context
    Pilny, Andrew
    Bonito, Joseph
    Schecter, Aaron
    SMALL GROUP RESEARCH, 2025,