Lattice generation in attention-based speech recognition models

被引：6

作者：

Zapotoczny, Michal ^{[1
]}

Pietrzak, Piotr ^{[1
]}

Lancucki, Adrian ^{[1
]}

Chorowski, Jan ^{[1
]}

机构：

[1] Univ Wroclaw, Wroclaw, Poland

来源：

INTERSPEECH 2019 | 2019年

关键词：

speech recognition; beam search; artificial neural networks; attention-based models; lattice generation; decoding;

D O I：

10.21437/Interspeech.2019-2667

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

Attention-based neural speech recognition models are frequently decoded with beam search, which produces a tree of hypotheses. In many cases, such as when using external language models, numerous decoding hypotheses need to be considered, requiring large beam sizes during decoding. We demonstrate that it is possible to merge certain nodes in a tree of hypotheses, in order to obtain a decoding lattice, which increases the number of decoding hypotheses without increasing the number of candidates that are scored by the neural network. We propose a convolutional architecture, which facilitates comparing states of the model at different pi The experiments are carried on the Wall Street Journal dataset, where the lattice decoder obtains lower word error rates with smaller beam sizes, than an otherwise similar architecture with regular beam search.

引用

页码：2225 / 2229

页数：5

共 50 条

[1] Attention-Based Models for Speech Recognition
Chorowski, Jan
Bahdanau, Dzmitry
Serdyuk, Dmitriy
Cho, Kyunghyun
Bengio, Yoshua
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28
[2] Towards Understanding Attention-Based Speech Recognition Models
Qin, Chu-Xiong
Qu, Dan
IEEE ACCESS, 2020, 8 : 24358 - 24369
[3] An Online Attention-Based Model for Speech Recognition
Fan, Ruchao
Zhou, Pan
Chen, Wei
Jia, Jia
Liu, Gang
INTERSPEECH 2019, 2019, : 4390 - 4394
[4] CONFIDENCE ESTIMATION FOR ATTENTION-BASED SEQUENCE-TO-SEQUENCE MODELS FOR SPEECH RECOGNITION
Li, Qiujia
Qiu, David
Zhang, Yu
Li, Bo
He, Yanzhang
Woodland, Philip C.
Cao, Liangliang
Strohman, Trevor
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6388 - 6392
[5] Siamese Attention-Based LSTM for Speech Emotion Recognition
Nizamidin, Tashpolat
Zhao, Li
Liang, Ruiyu
Xie, Yue
Hamdulla, Askar
IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2020, E103A (07) : 937 - 941
[6] ATTENTION-BASED SPEECH RECOGNITION USING GAZE INFORMATION
Segawa, Osamu
Hayashi, Tomoki
Takeda, Kazuya
2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 465 - 470
[7] Attention-Based Dense LSTM for Speech Emotion Recognition
Xie, Yue
Liang, Ruiyu
Liang, Zhenlin
Zhao, Li
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2019, E102D (07): : 1426 - 1429
[8] STREAMING ATTENTION-BASED MODELS WITH AUGMENTED MEMORY FOR END-TO-END SPEECH RECOGNITION
Yeh, Ching-Feng
Wang, Yongqiang
Shi, Yangyang
Wu, Chunyang
Zhang, Frank
Chan, Julian
Seltzer, Michael L.
2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 8 - 14
[9] Speech Emotion Recognition via Generation using an Attention-based Variational Recurrent Neural Network
Baruah, Murchana
Banerjee, Bonny
INTERSPEECH 2022, 2022, : 4710 - 4714
[10] Effective Exploitation of Posterior Information for Attention-Based Speech Recognition
Tang, Jian
Hou, Junfeng
Song, Yan
Dai, Li-Rong
McLoughlin, Ian
IEEE ACCESS, 2020, 8 (08): : 108988 - 108999

← 1 2 3 4 5 →