SUBWORD REGULARIZATION AND BEAM SEARCH DECODING FOR END-TO-END AUTOMATIC SPEECH RECOGNITION

被引:0
|
作者
Drexler, Jennifer [1 ]
Glass, James [1 ]
机构
[1] MIT, Comp Sci & Artificial Intelligence Lab, 77 Massachusetts Ave, Cambridge, MA 02139 USA
关键词
automatic speech recognition; subword units; beam search; CTC; attention;
D O I
10.1109/icassp.2019.8683531
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we experiment with the recently introduced subword regularization technique [ 1] in the context of end-to-end automatic speech recognition ( ASR). We present results from both attention-based and CTC-based ASR systems on two common benchmark datasets, the 80 hour Wall Street Journal corpus and 1,000 hour Librispeech corpus. We also introduce a novel subword beam search decoding algorithm that significantly improves the final performance of the CTC-based systems. Overall, we find that subword regularization improves the performance of both types of ASR systems, with the regularized attention-based model performing best overall.
引用
收藏
页码:6266 / 6270
页数:5
相关论文
共 50 条
  • [21] Insertion-Based Modeling for End-to-End Automatic Speech Recognition
    Fujita, Yuya
    Watanabe, Shinji
    Omachi, Motoi
    Chang, Xuankai
    INTERSPEECH 2020, 2020, : 3660 - 3664
  • [22] Controlling the Noise Robustness of End-to-End Automatic Speech Recognition Systems
    Moeller, Matthias
    Twiefel, Johannes
    Weber, Cornelius
    Wermter, Stefan
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [23] Analyzing Hidden Representations in End-to-End Automatic Speech Recognition Systems
    Belinkov, Yonatan
    Glass, James
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [24] Analyzing Phonetic and Graphemic Representations in End-to-End Automatic Speech Recognition
    Belinkov, Yonatan
    Ali, Ahmed
    Glass, James
    INTERSPEECH 2019, 2019, : 81 - 85
  • [25] Quaternion Convolutional Neural Networks for End-to-End Automatic Speech Recognition
    Parcollet, Titouan
    Zhang, Ying
    Morchid, Mohamed
    Trabelsi, Chiheb
    Linares, Georges
    De Mori, Renato
    Bengio, Yoshua
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 22 - 26
  • [26] A Neural Time Alignment Module for End-to-End Automatic Speech Recognition
    Jiang, Dongcheng
    Zhang, Chao
    Woodland, Philip C.
    INTERSPEECH 2023, 2023, : 1374 - 1378
  • [27] Unidirectional Neural Network Architectures for End-to-End Automatic Speech Recognition
    Moritz, Niko
    Hori, Takaaki
    Le Roux, Jonathan
    INTERSPEECH 2019, 2019, : 76 - 80
  • [28] Towards end-to-end training of automatic speech recognition for nigerian pidgin
    Ajisafe, Daniel
    Adegboro, Oluwabukola
    Oduntan, Esther
    Arulogun, Tayo
    arXiv, 2020,
  • [29] Integrated End-to-End Automatic Speech Recognition for Languages for Agglutinative Languages
    Bekarystankyzy, Akbayan
    Mamyrbayev, Orken
    Anarbekova, Tolganay
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2024, 23 (06)
  • [30] Dealing with Unknowns in Continual Learning for End-to-end Automatic Speech Recognition
    Sustek, Martin
    Sadhu, Samik
    Hermansky, Hynek
    INTERSPEECH 2022, 2022, : 1046 - 1050