Fast Decoding in Sequence Models Using Discrete Latent Variables

被引:0
|
作者
Kaiser, Lukasz [1 ]
Roy, Aurko [1 ]
Vaswani, Ashish [1 ]
Parmar, Niki [1 ]
Bengio, Samy [1 ]
Uszkoreit, Jakob [1 ]
Shazeer, Noam [1 ]
机构
[1] Google Brain, Mountain View, CA 94043 USA
关键词
QUANTIZATION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Autoregressive sequence models based on deep neural networks, such as RNNs, Wavenet and the Transformer attain state-of-the-art results on many tasks. However, they are difficult to parallelize and are thus slow at processing long sequences. RNNs lack parallelism both during training and decoding, while architectures like WaveNet and Transformer are much more parallelizable during training, yet still operate sequentially during decoding. We present a method to extend sequence models using discrete latent variables that makes decoding much more parallelizable. We first autoencode the target sequence into a shorter sequence of discrete latent variables, which at inference time is generated autoregressively, and finally decode the output sequence from this shorter latent sequence in parallel. To this end, we introduce a novel method for constructing a sequence of discrete latent variables and compare it with previously introduced methods. Finally, we evaluate our model end-to-end on the task of neural machine translation, where it is an order of magnitude faster at decoding than comparable autoregressive models. While lower in BLEU than purely autoregressive models, our model achieves higher scores than previously proposed non-autoregressive translation models.
引用
收藏
页数:10
相关论文
共 50 条
  • [31] Latent variable models with nonparametric interaction effects of latent variables
    Song, Xinyuan
    Lu, Zhaohua
    Feng, Xiangnan
    STATISTICS IN MEDICINE, 2014, 33 (10) : 1723 - 1737
  • [32] Fast estimation of multiple group generalized linear latent variable models for categorical observed variables
    Andersson, Bjorn
    Jin, Shaobo
    Zhang, Maoxin
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2023, 182
  • [34] Latent alignment in deep learning models for EEG decoding
    Bakas, Stylianos
    Ludwig, Siegfried
    Adamos, Dimitrios A.
    Laskaris, Nikolaos
    Panagakis, Yannis
    Zafeiriou, Stefanos
    JOURNAL OF NEURAL ENGINEERING, 2025, 22 (01)
  • [35] Latent State-Space Models for Neural Decoding
    Aghagolzadeh, Mehdi
    Truccolo, Wilson
    2014 36TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC), 2014, : 3033 - 3036
  • [36] Micro-macro multilevel latent class models with multiple discrete individual-level variables
    Bennink, Margot
    Croon, Marcel A.
    Kroon, Brigitte
    Vermunt, Jeroen K.
    ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2016, 10 (02) : 139 - 154
  • [37] Defining a Family of Cognitive Diagnosis Models Using Log-Linear Models with Latent Variables
    Robert A. Henson
    Jonathan L. Templin
    John T. Willse
    Psychometrika, 2009, 74 : 191 - 210
  • [38] Defining a Family of Cognitive Diagnosis Models Using Log-Linear Models with Latent Variables
    Henson, Robert A.
    Templin, Jonathan L.
    Willse, John T.
    PSYCHOMETRIKA, 2009, 74 (02) : 191 - 210
  • [39] Confirmatory and structural categorical latent variables models
    Yang, Chih-Chiang
    QUALITY & QUANTITY, 2007, 41 (06) : 831 - 849
  • [40] Interactions of Latent Variables in Structural Equation Models
    Bollen, Kenneth A.
    Paxton, Pamela
    STRUCTURAL EQUATION MODELING-A MULTIDISCIPLINARY JOURNAL, 1998, 5 (03) : 267 - 293