Fast Decoding in Sequence Models Using Discrete Latent Variables

被引：0

作者：

Kaiser, Lukasz ^{[1
]}

Roy, Aurko ^{[1
]}

Vaswani, Ashish ^{[1
]}

Parmar, Niki ^{[1
]}

Bengio, Samy ^{[1
]}

Uszkoreit, Jakob ^{[1
]}

Shazeer, Noam ^{[1
]}

机构：

[1] Google Brain, Mountain View, CA 94043 USA

来源：

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80 | 2018年 / 80卷

关键词：

QUANTIZATION;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Autoregressive sequence models based on deep neural networks, such as RNNs, Wavenet and the Transformer attain state-of-the-art results on many tasks. However, they are difficult to parallelize and are thus slow at processing long sequences. RNNs lack parallelism both during training and decoding, while architectures like WaveNet and Transformer are much more parallelizable during training, yet still operate sequentially during decoding. We present a method to extend sequence models using discrete latent variables that makes decoding much more parallelizable. We first autoencode the target sequence into a shorter sequence of discrete latent variables, which at inference time is generated autoregressively, and finally decode the output sequence from this shorter latent sequence in parallel. To this end, we introduce a novel method for constructing a sequence of discrete latent variables and compare it with previously introduced methods. Finally, we evaluate our model end-to-end on the task of neural machine translation, where it is an order of magnitude faster at decoding than comparable autoregressive models. While lower in BLEU than purely autoregressive models, our model achieves higher scores than previously proposed non-autoregressive translation models.

引用

页数：10

共 50 条

[1] Discrete choice models with latent variables using subjective data
Morikawa, T
Sasaki, K
TRAVEL BEHAVIOUR RESEARCH: UPDATING THE STATE OF PLAY, 1998, : 435 - 455
[2] Fast Structured Decoding for Sequence Models
Sun, Zhiqing
Li, Zhuohan
Wang, Haoqing
He, Di
Lin, Zi
Deng, Zhi-Hong
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[3] Local Classification of Discrete Variables by Latent Class Models
Buecker, Michael
Szepannek, Gero
Weihs, Claus
CLASSIFICATION AS A TOOL FOR RESEARCH, 2010, : 127 - 135
[4] FAST COMPRESSIVE SENSING RECOVERY USING GENERATIVE MODELS WITH STRUCTURED LATENT VARIABLES
Xu, Shaojie
Zeng, Sihan
Romberg, Justin
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 2967 - 2971
[5] Smooth, identifiable supermodels of discrete DAG models with latent variables
Evans, Robin J.
Richardson, Thomas S.
BERNOULLI, 2019, 25 (02) : 848 - 876
[6] Fast and accurate variational inference for models with many latent variables
Loaiza-Maya, Ruben
Smith, Michael Stanley
Nott, David J.
Danaher, Peter J.
JOURNAL OF ECONOMETRICS, 2022, 230 (02) : 339 - 362
[7] Switching Regression Models and Causal Inference in the Presence of Discrete Latent Variables
Christiansen, Rune
Peters, Jonas
JOURNAL OF MACHINE LEARNING RESEARCH, 2020, 21
[8] Switching regression models and causal inference in the presence of discrete latent variables
Department of Mathematical Sciences, University of Copenhagen, Copenhagen, Denmark
Christiansen, Rune (krunechristiansen@math.ku.dk), 1600, Microtome Publishing (21):
[9] Extending discrete choice models to incorporate attitudinal and other latent variables
Ashok, K
Dillon, WR
Yuan, S
JOURNAL OF MARKETING RESEARCH, 2002, 39 (01) : 31 - 46
[10] Blessing of dependence: Identifiability and geometry of discrete models with multiple binary latent variables
Gu, Yuqi
BERNOULLI, 2025, 31 (02) : 948 - 972

← 1 2 3 4 5 →