Label-Synchronous Neural Transducer for Adaptable Online E2E Speech Recognition

被引:0
|
作者
Deng, Keqi [1 ]
Woodland, Philip C. [1 ]
机构
[1] Univ Cambridge, Dept Engn, Cambridge CB2 1TN, England
基金
英国工程与自然科学研究理事会;
关键词
Domain adaptation; E2E ASR; neural transducer; LANGUAGE MODEL; RNN-TRANSDUCER; ARCHITECTURE; TRANSFORMER;
D O I
10.1109/TASLP.2024.3419421
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Although end-to-end (E2E) automatic speech recognition (ASR) has shown state-of-the-art recognition accuracy, it tends to be implicitly biased towards the training data distribution which can degrade generalisation. This paper proposes a label-synchronous neural transducer (LS-Transducer), which provides a natural approach to domain adaptation based on text-only data. The LS-Transducer extracts a label-level encoder representation before combining it with the prediction network output. Since blank tokens are no longer needed, the prediction network performs as a standard language model, which can be easily adapted using text-only data. An Auto-regressive Integrate-and-Fire (AIF) mechanism is proposed to generate the label-level encoder representation while retaining low latency operation that can be used for streaming. In addition, a streaming joint decoding method is designed to improve ASR accuracy while retaining synchronisation with AIF. Experiments show that compared to standard neural transducers, the proposed LS-Transducer gave a 12.9% relative WER reduction (WERR) for intra-domain LibriSpeech data, as well as 21.4% and 24.6% relative WERRs on cross-domain TED-LIUM 2 and AESRC2020 data with an adapted prediction network.
引用
收藏
页码:3507 / 3516
页数:10
相关论文
共 25 条
  • [1] Label-Synchronous Neural Transducer for E2E Simultaneous Speech Translation
    Deng, Keqi
    Woodland, Philip C.
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 8235 - 8251
  • [2] Deep Neural Network Calibration for E2E Speech Recognition System
    Lee, Mun-Hak
    Chang, Joon-Hyuk
    INTERSPEECH 2021, 2021, : 4064 - 4068
  • [3] Dissecting User-Perceived Latency of On-Device E2E Speech Recognition
    Yuan Shangguan
    Prabhavalkar, Rohit
    Hang Su
    Mahadeokar, Jay
    Shi, Yangyang
    Zhou, Jiatong
    Wu, Chunyang
    Duc Le
    Kalinli, Ozlem
    Fuegen, Christian
    Seltzer, Michael L.
    INTERSPEECH 2021, 2021, : 4553 - 4557
  • [4] Rapid Language Adaptation for Multilingual E2E Speech Recognition Using Encoder Prompting
    Kashiwagi, Yosuke
    Futami, Hayato
    Tsunoo, Emiru
    Arora, Siddhant
    Watanabe, Shinji
    INTERSPEECH 2024, 2024, : 2900 - 2904
  • [5] Integration of Frame- and Label-synchronous Beam Search for Streaming Encoder-decoder Speech Recognition
    Tsunoo, Emiru
    Futami, Hayato
    Kashiwagi, Yosuke
    Arora, Siddhant
    Watanabe, Shinji
    INTERSPEECH 2023, 2023, : 1369 - 1373
  • [6] USS DIRECTED E2E SPEECH SYNTHESIS FOR INDIAN LANGUAGES
    Srivastava, Sudhanshu
    Murthy, Hema A.
    2022 IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS, SPCOM, 2022,
  • [7] Leveraging Phone Mask Training for Phonetic-Reduction-Robust E2E Uyghur Speech Recognition
    Ma, Guodong
    Hu, Pengfei
    Kang, Jian
    Huang, Shen
    Huang, Hao
    INTERSPEECH 2021, 2021, : 306 - 310
  • [8] Few-shot learning for E2E speech recognition: architectural variants for support set generation
    Eledath, Dhanya
    Thurlapati, Narasimha Rao
    Pavithra, V
    Banerjee, Tirthankar
    Ramasubramanian, V
    2022 30TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2022), 2022, : 444 - 448
  • [9] INTERNAL LANGUAGE MODEL PERSONALIZATION OF E2E AUTOMATIC SPEECH RECOGNITION USING RANDOM ENCODER FEATURES
    Stooke, Adam
    Sim, Khe Chai
    Chua, Mason
    Munkhdalai, Tsendsuren
    Strohman, Trevor
    2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 213 - 220
  • [10] DIRECTIONAL ASR: A NEW PARADIGM FOR E2E MULTI-SPEAKER SPEECH RECOGNITION WITH SOURCE LOCALIZATION
    Subramanian, Aswin Shanmugam
    Weng, Chao
    Watanabe, Shinji
    Yu, Meng
    Xu, Yong
    Zhang, Shi-Xiong
    Yu, Dong
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 8433 - 8437