CHARACTER-AWARE ATTENTION-BASED END-TO-END SPEECH RECOGNITION

被引:0
|
作者
Meng, Zhong [1 ]
Gaur, Yashesh [1 ]
Li, Jinyu [1 ]
Gong, Yifan [1 ]
机构
[1] Microsoft Corp, Redmond, WA 98052 USA
关键词
character-aware; end-to-end; attention; encoder-decoder; speech recognition; NEURAL-NETWORKS;
D O I
10.1109/asru46091.2019.9004018
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Predicting words and subword units (WSUs) as the output has shown to be effective for the attention-based encoder-decoder (AED) model in end-to-end speech recognition. However, as one input to the decoder recurrent neural network (RNN), each WSU embedding is learned independently through context and acoustic information in a purely data-driven fashion. Little effort has been made to explicitly model the morphological relationships among WSUs. In this work, we propose a novel character-aware (CA) AED model in which each WSU embedding is computed by summarizing the embeddings of its constituent characters using a CA-RNN. This WSU-independent CA-RNN is jointly trained with the encoder, the decoder and the attention network of a conventional AED to predict WSUs. With CA-AED, the embeddings of morphologically similar WSUs are naturally and directly correlated through the CA-RNN in addition to the semantic and acoustic relations modeled by a traditional AED. Moreover, CA-AED significantly reduces the model parameters in a traditional AED by replacing the large pool of WSU embeddings with a much smaller set of character embeddings. On a 3400 hours Microsoft Cortana dataset, CA-AED achieves up to 11.9% relative WER improvement over a strong AED baseline with 27.1% fewer model parameters.
引用
收藏
页码:949 / 955
页数:7
相关论文
共 50 条
  • [21] DIALOG-CONTEXT AWARE END-TO-END SPEECH RECOGNITION
    Kim, Suyoun
    Metze, Florian
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 434 - 440
  • [22] Context-dependent Label Smoothing Regularization for Attention-based End-to-End Code-Switching Speech Recognition
    Huang, Zheying
    Li, Peng
    Xu, Ji
    Zhang, Pengyuan
    Yan, Yonghong
    2021 12TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2021,
  • [23] TOKEN-WISE TRAINING FOR ATTENTION BASED END-TO-END SPEECH RECOGNITION
    Wang, Peidong
    Cui, Jia
    Weng, Chao
    Yu, Dong
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6276 - 6280
  • [24] Hybrid CTC/Attention Architecture for End-to-End Speech Recognition
    Watanabe, Shinji
    Hori, Takaaki
    Kim, Suyoun
    Hershey, John R.
    Hayashi, Tomoki
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2017, 11 (08) : 1240 - 1253
  • [25] STRUCTURED SPARSE ATTENTION FOR END-TO-END AUTOMATIC SPEECH RECOGNITION
    Xue, Jiabin
    Zheng, Tieran
    Han, Jiqing
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7044 - 7048
  • [26] Improved training of end-to-end attention models for speech recognition
    Zeyer, Albert
    Irie, Kazuki
    Schlueter, Ralf
    Ney, Hermann
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 7 - 11
  • [27] Self-Attention Transducers for End-to-End Speech Recognition
    Tian, Zhengkun
    Yi, Jiangyan
    Tao, Jianhua
    Bai, Ye
    Wen, Zhengqi
    INTERSPEECH 2019, 2019, : 4395 - 4399
  • [28] Multi-channel Attention for End-to-End Speech Recognition
    Braun, Stefan
    Neil, Daniel
    Anumula, Jithendar
    Ceolini, Enea
    Liu, Shih-Chii
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 17 - 21
  • [29] Joint CTC/attention decoding for end-to-end speech recognition
    Hori, Takaaki
    Watanabe, Shinji
    Hershey, John R.
    PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 1, 2017, : 518 - 529
  • [30] RAttSR: A Novel Low-Cost Reconstructed Attention-Based End-to-End Speech Recognizer
    Bachchu Paul
    Santanu Phadikar
    Circuits, Systems, and Signal Processing, 2024, 43 : 2454 - 2476