Lattice generation in attention-based speech recognition models

被引：6

作者：

Zapotoczny, Michal ^{[1
]}

Pietrzak, Piotr ^{[1
]}

Lancucki, Adrian ^{[1
]}

Chorowski, Jan ^{[1
]}

机构：

[1] Univ Wroclaw, Wroclaw, Poland

来源：

INTERSPEECH 2019 | 2019年

关键词：

speech recognition; beam search; artificial neural networks; attention-based models; lattice generation; decoding;

D O I：

10.21437/Interspeech.2019-2667

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

Attention-based neural speech recognition models are frequently decoded with beam search, which produces a tree of hypotheses. In many cases, such as when using external language models, numerous decoding hypotheses need to be considered, requiring large beam sizes during decoding. We demonstrate that it is possible to merge certain nodes in a tree of hypotheses, in order to obtain a decoding lattice, which increases the number of decoding hypotheses without increasing the number of candidates that are scored by the neural network. We propose a convolutional architecture, which facilitates comparing states of the model at different pi The experiments are carried on the Wall Street Journal dataset, where the lattice decoder obtains lower word error rates with smaller beam sizes, than an otherwise similar architecture with regular beam search.

引用

页码：2225 / 2229

页数：5

共 50 条

[31] AN ANALYSIS OF DECODING FOR ATTENTION-BASED END-TO-END MANDARIN SPEECH RECOGNITION
Jiang, Dongwei
Zou, Wei
Zhao, Shuaijiang
Yang, Guilin
Li, Xiangang
2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2018, : 384 - 388
[32] Attention-based Text Recognition in the Wild
Yan, Zhi-Chen
Yu, Stephanie A.
PROCEEDINGS OF THE 1ST INTERNATIONAL CONFERENCE ON DEEP LEARNING THEORY AND APPLICATIONS (DELTA), 2020, : 42 - 49
[33] ATTENTION-BASED PARTIAL FACE RECOGNITION
Hoermann, Stefan
Zhang, Zeyuan
Knoche, Martin
Teepe, Torben
Rigoll, Gerhard
2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 2978 - 2982
[34] ATTENTION-BASED GATED SCALING ADAPTIVE ACOUSTIC MODEL FOR CTC-BASED SPEECH RECOGNITION
Ding, Fenglin
Guo, Wu
Dai, Lirong
Du, Jun
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7404 - 7408
[35] A Fully Integrated 1.7mW Attention-Based Automatic Speech Recognition Processor
Liou, Yi-Long
Hsu, Jui-Yang
Chen, Chen-Sheng
Liu, Alexander H.
Lee, Hung-Yi
Liu, Tsung-Te
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2022, 69 (10) : 4178 - 4182
[36] Multi-stream Attention-based BLSTM with Feature Segmentation for Speech Emotion Recognition
Chiba, Yuya
Nose, Takashi
Ito, Akinori
INTERSPEECH 2020, 2020, : 3301 - 3305
[37] On Online Attention-based Speech Recognition and Joint Mandarin Character-Pinyin Training
Chan, William
Lane, Ian
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3404 - 3408
[38] STREAM ATTENTION-BASED MULTI-ARRAY END-TO-END SPEECH RECOGNITION
Wang, Xiaofei
Li, Ruizhi
Mallidi, Sri Harish
Hori, Takaaki
Watanabe, Shinji
Hermansky, Hynek
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7105 - 7109
[39] A NOVEL ATTENTION-BASED GATED RECURRENT UNIT AND ITS EFFICACY IN SPEECH EMOTION RECOGNITION
Rajamani, Srividya Tirunellai
Rajamani, Kumar T.
Mallol-Ragolta, Adria
Liu, Shuo
Schuller, Bjoern
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6294 - 6298
[40] Attention-based CNN and Relative Phase Feature Modeling for Improved Imagined Speech Recognition
Niimura, Yoshiki
Takemoto, Jun
Kai, Atsuhiko
Nakagawa, Seiichi
2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 8 - 14

← 1 2 3 4 5 →