Few-shot learning for E2E speech recognition: architectural variants for support set generation

被引：0

作者：

Eledath, Dhanya ^{[1
]}

Thurlapati, Narasimha Rao ^{[2
]}

Pavithra, V ^{[2
]}

Banerjee, Tirthankar ^{[3
]}

Ramasubramanian, V ^{[3
]}

机构：

[1] Int Inst Informat Technol Bangalore IIITB, Bangalore, Karnataka, India

[2] Samsung R&D Inst Bangalore SRI B, Bangalore, Karnataka, India

[3] IIIT Bangalore, Bangalore, Karnataka, India

来源：

2022 30TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2022) | 2022年

关键词：

Few-shot Learning; Matching Networks; Continuous Speech Recognition; Coupled and Uncoupled architectures; Support Set Generation;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In this paper, we propose two architectural variants of our recent adaptation of a 'few shot-learning' (FSL) framework 'Matching Networks' (MN) to end-to-end (E2E) continuous speech recognition (CSR) in a formulation termed 'MN-CTC' which involves a CTC-loss based end-to-end episodic training of MN and an associated CTC-based decoding of continuous speech. An important component of the MN theory is the labelled support-set during training and inference. The architectural variants proposed and studied here for E2E CSR, namely, the 'Uncoupled MN-CTC' and the 'Coupled MN-CTC', address this problem of generating supervised support sets from continuous speech. While the 'Uncoupled MN-CTC' generates the support-sets 'outside' the MN-architecture, the 'Coupled MN-CTC' variant is a derivative framework which generates the support set 'within' the MN-architecture through a multitask formulation coupling the support-set generation loss and the main MN-CTC loss for jointly optimizing the support-sets and the embedding functions of MN. On TIMIT and Librispeech datasets, we establish the 'few-shot' effectiveness of the proposed variants with PER and LER performances and also demonstrate the cross-domain applicability of the MN-CTC formulation with a Librispeech trained 'Coupled MN-CTC' variant inferencing on TIMIT low resource target-corpus with a 8% (absolute) LER advantage over a single-domain (TIMIT only) scenario.

引用

页码：444 / 448

页数：5

共 33 条

[1] Few-shot Learning for Low-resource E2E ASR: Mono-, Cross- and Multi-lingual Scenarios
Eledath, Dhanya
Ramasubramanian, V
2024 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS, SPCOM 2024, 2024,
[2] Deep Neural Network Calibration for E2E Speech Recognition System
Lee, Mun-Hak
Chang, Joon-Hyuk
INTERSPEECH 2021, 2021, : 4064 - 4068
[3] Learning Relative Feature Displacement for Few-Shot Open-Set Recognition
Deng, Shule
Yu, Jin-Gang
Wu, Zihao
Gao, Hongxia
Li, Yansheng
Yang, Yang
IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 5763 - 5774
[4] Glocal Energy-based Learning for Few-Shot Open-Set Recognition
Wang, Haoyu
Pang, Guansong
Wang, Peng
Zhang, Lei
Wei, Wei
Zhang, Yanning
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 7507 - 7516
[5] NC2E: boosting few-shot learning with novel class center estimation
Wu, Zheng
Shen, Changchun
Guo, Kehua
Luo, Entao
Wang, Liwei
NEURAL COMPUTING & APPLICATIONS, 2023, 35 (09): : 7049 - 7062
[6] Label-Synchronous Neural Transducer for Adaptable Online E2E Speech Recognition
Deng, Keqi
Woodland, Philip C.
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 3507 - 3516
[7] Dissecting User-Perceived Latency of On-Device E2E Speech Recognition
Yuan Shangguan
Prabhavalkar, Rohit
Hang Su
Mahadeokar, Jay
Shi, Yangyang
Zhou, Jiatong
Wu, Chunyang
Duc Le
Kalinli, Ozlem
Fuegen, Christian
Seltzer, Michael L.
INTERSPEECH 2021, 2021, : 4553 - 4557
[8] Cross-Corpus Speech Emotion Recognition Based on Few-Shot Learning and Domain Adaptation
Ahn, Youngdo
Lee, Sung Joo
Shin, Jong Won
IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 1190 - 1194
[9] Rapid Language Adaptation for Multilingual E2E Speech Recognition Using Encoder Prompting
Kashiwagi, Yosuke
Futami, Hayato
Tsunoo, Emiru
Arora, Siddhant
Watanabe, Shinji
INTERSPEECH 2024, 2024, : 2900 - 2904
[10] TNPNet: An approach to Few-shot open-set recognition via contextual transductive learning
Wu, Shaoling
Luo, Huilan
Lin, Xiaoming
NEUROCOMPUTING, 2025, 621

← 1 2 3 4 →