Lattice generation in attention-based speech recognition models

被引:6
|
作者
Zapotoczny, Michal [1 ]
Pietrzak, Piotr [1 ]
Lancucki, Adrian [1 ]
Chorowski, Jan [1 ]
机构
[1] Univ Wroclaw, Wroclaw, Poland
来源
关键词
speech recognition; beam search; artificial neural networks; attention-based models; lattice generation; decoding;
D O I
10.21437/Interspeech.2019-2667
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Attention-based neural speech recognition models are frequently decoded with beam search, which produces a tree of hypotheses. In many cases, such as when using external language models, numerous decoding hypotheses need to be considered, requiring large beam sizes during decoding. We demonstrate that it is possible to merge certain nodes in a tree of hypotheses, in order to obtain a decoding lattice, which increases the number of decoding hypotheses without increasing the number of candidates that are scored by the neural network. We propose a convolutional architecture, which facilitates comparing states of the model at different pi The experiments are carried on the Wall Street Journal dataset, where the lattice decoder obtains lower word error rates with smaller beam sizes, than an otherwise similar architecture with regular beam search.
引用
收藏
页码:2225 / 2229
页数:5
相关论文
共 50 条
  • [1] Attention-Based Models for Speech Recognition
    Chorowski, Jan
    Bahdanau, Dzmitry
    Serdyuk, Dmitriy
    Cho, Kyunghyun
    Bengio, Yoshua
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28
  • [2] Towards Understanding Attention-Based Speech Recognition Models
    Qin, Chu-Xiong
    Qu, Dan
    IEEE ACCESS, 2020, 8 : 24358 - 24369
  • [3] An Online Attention-Based Model for Speech Recognition
    Fan, Ruchao
    Zhou, Pan
    Chen, Wei
    Jia, Jia
    Liu, Gang
    INTERSPEECH 2019, 2019, : 4390 - 4394
  • [4] CONFIDENCE ESTIMATION FOR ATTENTION-BASED SEQUENCE-TO-SEQUENCE MODELS FOR SPEECH RECOGNITION
    Li, Qiujia
    Qiu, David
    Zhang, Yu
    Li, Bo
    He, Yanzhang
    Woodland, Philip C.
    Cao, Liangliang
    Strohman, Trevor
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6388 - 6392
  • [5] Siamese Attention-Based LSTM for Speech Emotion Recognition
    Nizamidin, Tashpolat
    Zhao, Li
    Liang, Ruiyu
    Xie, Yue
    Hamdulla, Askar
    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2020, E103A (07) : 937 - 941
  • [6] ATTENTION-BASED SPEECH RECOGNITION USING GAZE INFORMATION
    Segawa, Osamu
    Hayashi, Tomoki
    Takeda, Kazuya
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 465 - 470
  • [7] Attention-Based Dense LSTM for Speech Emotion Recognition
    Xie, Yue
    Liang, Ruiyu
    Liang, Zhenlin
    Zhao, Li
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2019, E102D (07): : 1426 - 1429
  • [8] STREAMING ATTENTION-BASED MODELS WITH AUGMENTED MEMORY FOR END-TO-END SPEECH RECOGNITION
    Yeh, Ching-Feng
    Wang, Yongqiang
    Shi, Yangyang
    Wu, Chunyang
    Zhang, Frank
    Chan, Julian
    Seltzer, Michael L.
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 8 - 14
  • [9] Speech Emotion Recognition via Generation using an Attention-based Variational Recurrent Neural Network
    Baruah, Murchana
    Banerjee, Bonny
    INTERSPEECH 2022, 2022, : 4710 - 4714
  • [10] Effective Exploitation of Posterior Information for Attention-Based Speech Recognition
    Tang, Jian
    Hou, Junfeng
    Song, Yan
    Dai, Li-Rong
    McLoughlin, Ian
    IEEE ACCESS, 2020, 8 (08): : 108988 - 108999