CHARACTER-AWARE ATTENTION-BASED END-TO-END SPEECH RECOGNITION

被引:0
|
作者
Meng, Zhong [1 ]
Gaur, Yashesh [1 ]
Li, Jinyu [1 ]
Gong, Yifan [1 ]
机构
[1] Microsoft Corp, Redmond, WA 98052 USA
关键词
character-aware; end-to-end; attention; encoder-decoder; speech recognition; NEURAL-NETWORKS;
D O I
10.1109/asru46091.2019.9004018
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Predicting words and subword units (WSUs) as the output has shown to be effective for the attention-based encoder-decoder (AED) model in end-to-end speech recognition. However, as one input to the decoder recurrent neural network (RNN), each WSU embedding is learned independently through context and acoustic information in a purely data-driven fashion. Little effort has been made to explicitly model the morphological relationships among WSUs. In this work, we propose a novel character-aware (CA) AED model in which each WSU embedding is computed by summarizing the embeddings of its constituent characters using a CA-RNN. This WSU-independent CA-RNN is jointly trained with the encoder, the decoder and the attention network of a conventional AED to predict WSUs. With CA-AED, the embeddings of morphologically similar WSUs are naturally and directly correlated through the CA-RNN in addition to the semantic and acoustic relations modeled by a traditional AED. Moreover, CA-AED significantly reduces the model parameters in a traditional AED by replacing the large pool of WSU embeddings with a much smaller set of character embeddings. On a 3400 hours Microsoft Cortana dataset, CA-AED achieves up to 11.9% relative WER improvement over a strong AED baseline with 27.1% fewer model parameters.
引用
收藏
页码:949 / 955
页数:7
相关论文
共 50 条
  • [41] An attention-based end-to-end model for multiple text lines recognition in japanese historical documents
    Ly, Nam Tuan
    Nguyen, Cuong Tuan
    Nakagawa, Masaki
    Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, 2019, : 629 - 634
  • [42] Real-time emotion recognition using end-to-end attention-based fusion network
    Shit, Sahadeb
    Rana, Aiswarya
    Das, Dibyendu Kumar
    Ray, Dip Narayan
    JOURNAL OF ELECTRONIC IMAGING, 2023, 32 (01)
  • [43] Noise-robust Attention Learning for End-to-End Speech Recognition
    Higuchi, Yosuke
    Tawara, Naohiro
    Ogawa, Atsunori
    Iwata, Tomoharu
    Kobayashi, Tetsunori
    Ogawa, Tetsuji
    28TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2020), 2021, : 311 - 315
  • [44] MODALITY ATTENTION FOR END-TO-END AUDIO-VISUAL SPEECH RECOGNITION
    Zhou, Pan
    Yang, Wenwen
    Chen, Wei
    Wang, Yanfeng
    Jia, Jia
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6565 - 6569
  • [45] Online Hybrid CTC/Attention Architecture for End-to-end Speech Recognition
    Miao, Haoran
    Cheng, Gaofeng
    Zhang, Pengyuan
    Li, Ta
    Yan, Yonghong
    INTERSPEECH 2019, 2019, : 2623 - 2627
  • [46] End-to-end automated speech recognition using a character based small scale transformer architecture
    Loubser, Alexander
    De Villiers, Pieter
    De Freitas, Allan
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 252
  • [47] ETEH: Unified Attention-Based End-to-End ASR and KWS Architecture
    Cheng, Gaofeng
    Miao, Haoran
    Yang, Runyan
    Deng, Keqi
    Yan, Yonghong
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 1360 - 1373
  • [48] End-to-End Speech Recognition in Russian
    Markovnikov, Nikita
    Kipyatkova, Irina
    Lyakso, Elena
    SPEECH AND COMPUTER (SPECOM 2018), 2018, 11096 : 377 - 386
  • [49] END-TO-END MULTIMODAL SPEECH RECOGNITION
    Palaskar, Shruti
    Sanabria, Ramon
    Metze, Florian
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5774 - 5778
  • [50] Overview of end-to-end speech recognition
    Wang, Song
    Li, Guanyu
    2018 INTERNATIONAL SYMPOSIUM ON POWER ELECTRONICS AND CONTROL ENGINEERING (ISPECE 2018), 2019, 1187