Lattice generation in attention-based speech recognition models

被引:6
|
作者
Zapotoczny, Michal [1 ]
Pietrzak, Piotr [1 ]
Lancucki, Adrian [1 ]
Chorowski, Jan [1 ]
机构
[1] Univ Wroclaw, Wroclaw, Poland
来源
INTERSPEECH 2019 | 2019年
关键词
speech recognition; beam search; artificial neural networks; attention-based models; lattice generation; decoding;
D O I
10.21437/Interspeech.2019-2667
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Attention-based neural speech recognition models are frequently decoded with beam search, which produces a tree of hypotheses. In many cases, such as when using external language models, numerous decoding hypotheses need to be considered, requiring large beam sizes during decoding. We demonstrate that it is possible to merge certain nodes in a tree of hypotheses, in order to obtain a decoding lattice, which increases the number of decoding hypotheses without increasing the number of candidates that are scored by the neural network. We propose a convolutional architecture, which facilitates comparing states of the model at different pi The experiments are carried on the Wall Street Journal dataset, where the lattice decoder obtains lower word error rates with smaller beam sizes, than an otherwise similar architecture with regular beam search.
引用
收藏
页码:2225 / 2229
页数:5
相关论文
共 50 条
  • [31] AN ANALYSIS OF DECODING FOR ATTENTION-BASED END-TO-END MANDARIN SPEECH RECOGNITION
    Jiang, Dongwei
    Zou, Wei
    Zhao, Shuaijiang
    Yang, Guilin
    Li, Xiangang
    2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2018, : 384 - 388
  • [32] Attention-based Text Recognition in the Wild
    Yan, Zhi-Chen
    Yu, Stephanie A.
    PROCEEDINGS OF THE 1ST INTERNATIONAL CONFERENCE ON DEEP LEARNING THEORY AND APPLICATIONS (DELTA), 2020, : 42 - 49
  • [33] ATTENTION-BASED PARTIAL FACE RECOGNITION
    Hoermann, Stefan
    Zhang, Zeyuan
    Knoche, Martin
    Teepe, Torben
    Rigoll, Gerhard
    2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 2978 - 2982
  • [34] ATTENTION-BASED GATED SCALING ADAPTIVE ACOUSTIC MODEL FOR CTC-BASED SPEECH RECOGNITION
    Ding, Fenglin
    Guo, Wu
    Dai, Lirong
    Du, Jun
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7404 - 7408
  • [35] A Fully Integrated 1.7mW Attention-Based Automatic Speech Recognition Processor
    Liou, Yi-Long
    Hsu, Jui-Yang
    Chen, Chen-Sheng
    Liu, Alexander H.
    Lee, Hung-Yi
    Liu, Tsung-Te
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2022, 69 (10) : 4178 - 4182
  • [36] Multi-stream Attention-based BLSTM with Feature Segmentation for Speech Emotion Recognition
    Chiba, Yuya
    Nose, Takashi
    Ito, Akinori
    INTERSPEECH 2020, 2020, : 3301 - 3305
  • [37] On Online Attention-based Speech Recognition and Joint Mandarin Character-Pinyin Training
    Chan, William
    Lane, Ian
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3404 - 3408
  • [38] STREAM ATTENTION-BASED MULTI-ARRAY END-TO-END SPEECH RECOGNITION
    Wang, Xiaofei
    Li, Ruizhi
    Mallidi, Sri Harish
    Hori, Takaaki
    Watanabe, Shinji
    Hermansky, Hynek
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7105 - 7109
  • [39] A NOVEL ATTENTION-BASED GATED RECURRENT UNIT AND ITS EFFICACY IN SPEECH EMOTION RECOGNITION
    Rajamani, Srividya Tirunellai
    Rajamani, Kumar T.
    Mallol-Ragolta, Adria
    Liu, Shuo
    Schuller, Bjoern
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6294 - 6298
  • [40] Attention-based CNN and Relative Phase Feature Modeling for Improved Imagined Speech Recognition
    Niimura, Yoshiki
    Takemoto, Jun
    Kai, Atsuhiko
    Nakagawa, Seiichi
    2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 8 - 14