CHARACTER-AWARE ATTENTION-BASED END-TO-END SPEECH RECOGNITION

被引：0

作者：

Meng, Zhong ^{[1
]}

Gaur, Yashesh ^{[1
]}

Li, Jinyu ^{[1
]}

Gong, Yifan ^{[1
]}

机构：

[1] Microsoft Corp, Redmond, WA 98052 USA

来源：

2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019) | 2019年

关键词：

character-aware; end-to-end; attention; encoder-decoder; speech recognition; NEURAL-NETWORKS;

D O I：

10.1109/asru46091.2019.9004018

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Predicting words and subword units (WSUs) as the output has shown to be effective for the attention-based encoder-decoder (AED) model in end-to-end speech recognition. However, as one input to the decoder recurrent neural network (RNN), each WSU embedding is learned independently through context and acoustic information in a purely data-driven fashion. Little effort has been made to explicitly model the morphological relationships among WSUs. In this work, we propose a novel character-aware (CA) AED model in which each WSU embedding is computed by summarizing the embeddings of its constituent characters using a CA-RNN. This WSU-independent CA-RNN is jointly trained with the encoder, the decoder and the attention network of a conventional AED to predict WSUs. With CA-AED, the embeddings of morphologically similar WSUs are naturally and directly correlated through the CA-RNN in addition to the semantic and acoustic relations modeled by a traditional AED. Moreover, CA-AED significantly reduces the model parameters in a traditional AED by replacing the large pool of WSU embeddings with a much smaller set of character embeddings. On a 3400 hours Microsoft Cortana dataset, CA-AED achieves up to 11.9% relative WER improvement over a strong AED baseline with 27.1% fewer model parameters.

引用

页码：949 / 955

页数：7

共 50 条

[41] An attention-based end-to-end model for multiple text lines recognition in japanese historical documents
Ly, Nam Tuan
Nguyen, Cuong Tuan
Nakagawa, Masaki
Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, 2019, : 629 - 634
[42] Real-time emotion recognition using end-to-end attention-based fusion network
Shit, Sahadeb
Rana, Aiswarya
Das, Dibyendu Kumar
Ray, Dip Narayan
JOURNAL OF ELECTRONIC IMAGING, 2023, 32 (01)
[43] Noise-robust Attention Learning for End-to-End Speech Recognition
Higuchi, Yosuke
Tawara, Naohiro
Ogawa, Atsunori
Iwata, Tomoharu
Kobayashi, Tetsunori
Ogawa, Tetsuji
28TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2020), 2021, : 311 - 315
[44] MODALITY ATTENTION FOR END-TO-END AUDIO-VISUAL SPEECH RECOGNITION
Zhou, Pan
Yang, Wenwen
Chen, Wei
Wang, Yanfeng
Jia, Jia
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6565 - 6569
[45] Online Hybrid CTC/Attention Architecture for End-to-end Speech Recognition
Miao, Haoran
Cheng, Gaofeng
Zhang, Pengyuan
Li, Ta
Yan, Yonghong
INTERSPEECH 2019, 2019, : 2623 - 2627
[46] End-to-end automated speech recognition using a character based small scale transformer architecture
Loubser, Alexander
De Villiers, Pieter
De Freitas, Allan
EXPERT SYSTEMS WITH APPLICATIONS, 2024, 252
[47] ETEH: Unified Attention-Based End-to-End ASR and KWS Architecture
Cheng, Gaofeng
Miao, Haoran
Yang, Runyan
Deng, Keqi
Yan, Yonghong
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 1360 - 1373
[48] End-to-End Speech Recognition in Russian
Markovnikov, Nikita
Kipyatkova, Irina
Lyakso, Elena
SPEECH AND COMPUTER (SPECOM 2018), 2018, 11096 : 377 - 386
[49] END-TO-END MULTIMODAL SPEECH RECOGNITION
Palaskar, Shruti
Sanabria, Ramon
Metze, Florian
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5774 - 5778
[50] Overview of end-to-end speech recognition
Wang, Song
Li, Guanyu
2018 INTERNATIONAL SYMPOSIUM ON POWER ELECTRONICS AND CONTROL ENGINEERING (ISPECE 2018), 2019, 1187

← 1 2 3 4 5 →