Label-Synchronous Neural Transducer for Adaptable Online E2E Speech Recognition

被引：0

作者：

Deng, Keqi ^{[1
]}

Woodland, Philip C. ^{[1
]}

机构：

[1] Univ Cambridge, Dept Engn, Cambridge CB2 1TN, England

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2024年 / 32卷

基金：

英国工程与自然科学研究理事会;

关键词：

Domain adaptation; E2E ASR; neural transducer; LANGUAGE MODEL; RNN-TRANSDUCER; ARCHITECTURE; TRANSFORMER;

D O I：

10.1109/TASLP.2024.3419421

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Although end-to-end (E2E) automatic speech recognition (ASR) has shown state-of-the-art recognition accuracy, it tends to be implicitly biased towards the training data distribution which can degrade generalisation. This paper proposes a label-synchronous neural transducer (LS-Transducer), which provides a natural approach to domain adaptation based on text-only data. The LS-Transducer extracts a label-level encoder representation before combining it with the prediction network output. Since blank tokens are no longer needed, the prediction network performs as a standard language model, which can be easily adapted using text-only data. An Auto-regressive Integrate-and-Fire (AIF) mechanism is proposed to generate the label-level encoder representation while retaining low latency operation that can be used for streaming. In addition, a streaming joint decoding method is designed to improve ASR accuracy while retaining synchronisation with AIF. Experiments show that compared to standard neural transducers, the proposed LS-Transducer gave a 12.9% relative WER reduction (WERR) for intra-domain LibriSpeech data, as well as 21.4% and 24.6% relative WERRs on cross-domain TED-LIUM 2 and AESRC2020 data with an adapted prediction network.

引用

页码：3507 / 3516

页数：10

共 25 条

[1] Label-Synchronous Neural Transducer for E2E Simultaneous Speech Translation
Deng, Keqi
Woodland, Philip C.
PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 8235 - 8251
[2] Deep Neural Network Calibration for E2E Speech Recognition System
Lee, Mun-Hak
Chang, Joon-Hyuk
INTERSPEECH 2021, 2021, : 4064 - 4068
[3] Dissecting User-Perceived Latency of On-Device E2E Speech Recognition
Yuan Shangguan
Prabhavalkar, Rohit
Hang Su
Mahadeokar, Jay
Shi, Yangyang
Zhou, Jiatong
Wu, Chunyang
Duc Le
Kalinli, Ozlem
Fuegen, Christian
Seltzer, Michael L.
INTERSPEECH 2021, 2021, : 4553 - 4557
[4] Rapid Language Adaptation for Multilingual E2E Speech Recognition Using Encoder Prompting
Kashiwagi, Yosuke
Futami, Hayato
Tsunoo, Emiru
Arora, Siddhant
Watanabe, Shinji
INTERSPEECH 2024, 2024, : 2900 - 2904
[5] Integration of Frame- and Label-synchronous Beam Search for Streaming Encoder-decoder Speech Recognition
Tsunoo, Emiru
Futami, Hayato
Kashiwagi, Yosuke
Arora, Siddhant
Watanabe, Shinji
INTERSPEECH 2023, 2023, : 1369 - 1373
[6] USS DIRECTED E2E SPEECH SYNTHESIS FOR INDIAN LANGUAGES
Srivastava, Sudhanshu
Murthy, Hema A.
2022 IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS, SPCOM, 2022,
[7] Leveraging Phone Mask Training for Phonetic-Reduction-Robust E2E Uyghur Speech Recognition
Ma, Guodong
Hu, Pengfei
Kang, Jian
Huang, Shen
Huang, Hao
INTERSPEECH 2021, 2021, : 306 - 310
[8] Few-shot learning for E2E speech recognition: architectural variants for support set generation
Eledath, Dhanya
Thurlapati, Narasimha Rao
Pavithra, V
Banerjee, Tirthankar
Ramasubramanian, V
2022 30TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2022), 2022, : 444 - 448
[9] INTERNAL LANGUAGE MODEL PERSONALIZATION OF E2E AUTOMATIC SPEECH RECOGNITION USING RANDOM ENCODER FEATURES
Stooke, Adam
Sim, Khe Chai
Chua, Mason
Munkhdalai, Tsendsuren
Strohman, Trevor
2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 213 - 220
[10] DIRECTIONAL ASR: A NEW PARADIGM FOR E2E MULTI-SPEAKER SPEECH RECOGNITION WITH SOURCE LOCALIZATION
Subramanian, Aswin Shanmugam
Weng, Chao
Watanabe, Shinji
Yu, Meng
Xu, Yong
Zhang, Shi-Xiong
Yu, Dong
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 8433 - 8437

← 1 2 3 →