Few-shot learning for E2E speech recognition: architectural variants for support set generation

被引:0
|
作者
Eledath, Dhanya [1 ]
Thurlapati, Narasimha Rao [2 ]
Pavithra, V [2 ]
Banerjee, Tirthankar [3 ]
Ramasubramanian, V [3 ]
机构
[1] Int Inst Informat Technol Bangalore IIITB, Bangalore, Karnataka, India
[2] Samsung R&D Inst Bangalore SRI B, Bangalore, Karnataka, India
[3] IIIT Bangalore, Bangalore, Karnataka, India
关键词
Few-shot Learning; Matching Networks; Continuous Speech Recognition; Coupled and Uncoupled architectures; Support Set Generation;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we propose two architectural variants of our recent adaptation of a 'few shot-learning' (FSL) framework 'Matching Networks' (MN) to end-to-end (E2E) continuous speech recognition (CSR) in a formulation termed 'MN-CTC' which involves a CTC-loss based end-to-end episodic training of MN and an associated CTC-based decoding of continuous speech. An important component of the MN theory is the labelled support-set during training and inference. The architectural variants proposed and studied here for E2E CSR, namely, the 'Uncoupled MN-CTC' and the 'Coupled MN-CTC', address this problem of generating supervised support sets from continuous speech. While the 'Uncoupled MN-CTC' generates the support-sets 'outside' the MN-architecture, the 'Coupled MN-CTC' variant is a derivative framework which generates the support set 'within' the MN-architecture through a multitask formulation coupling the support-set generation loss and the main MN-CTC loss for jointly optimizing the support-sets and the embedding functions of MN. On TIMIT and Librispeech datasets, we establish the 'few-shot' effectiveness of the proposed variants with PER and LER performances and also demonstrate the cross-domain applicability of the MN-CTC formulation with a Librispeech trained 'Coupled MN-CTC' variant inferencing on TIMIT low resource target-corpus with a 8% (absolute) LER advantage over a single-domain (TIMIT only) scenario.
引用
收藏
页码:444 / 448
页数:5
相关论文
共 33 条
  • [1] Few-shot Learning for Low-resource E2E ASR: Mono-, Cross- and Multi-lingual Scenarios
    Eledath, Dhanya
    Ramasubramanian, V
    2024 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS, SPCOM 2024, 2024,
  • [2] Deep Neural Network Calibration for E2E Speech Recognition System
    Lee, Mun-Hak
    Chang, Joon-Hyuk
    INTERSPEECH 2021, 2021, : 4064 - 4068
  • [3] Learning Relative Feature Displacement for Few-Shot Open-Set Recognition
    Deng, Shule
    Yu, Jin-Gang
    Wu, Zihao
    Gao, Hongxia
    Li, Yansheng
    Yang, Yang
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 5763 - 5774
  • [4] Glocal Energy-based Learning for Few-Shot Open-Set Recognition
    Wang, Haoyu
    Pang, Guansong
    Wang, Peng
    Zhang, Lei
    Wei, Wei
    Zhang, Yanning
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 7507 - 7516
  • [5] NC2E: boosting few-shot learning with novel class center estimation
    Wu, Zheng
    Shen, Changchun
    Guo, Kehua
    Luo, Entao
    Wang, Liwei
    NEURAL COMPUTING & APPLICATIONS, 2023, 35 (09): : 7049 - 7062
  • [6] Label-Synchronous Neural Transducer for Adaptable Online E2E Speech Recognition
    Deng, Keqi
    Woodland, Philip C.
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 3507 - 3516
  • [7] Dissecting User-Perceived Latency of On-Device E2E Speech Recognition
    Yuan Shangguan
    Prabhavalkar, Rohit
    Hang Su
    Mahadeokar, Jay
    Shi, Yangyang
    Zhou, Jiatong
    Wu, Chunyang
    Duc Le
    Kalinli, Ozlem
    Fuegen, Christian
    Seltzer, Michael L.
    INTERSPEECH 2021, 2021, : 4553 - 4557
  • [8] Cross-Corpus Speech Emotion Recognition Based on Few-Shot Learning and Domain Adaptation
    Ahn, Youngdo
    Lee, Sung Joo
    Shin, Jong Won
    IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 1190 - 1194
  • [9] Rapid Language Adaptation for Multilingual E2E Speech Recognition Using Encoder Prompting
    Kashiwagi, Yosuke
    Futami, Hayato
    Tsunoo, Emiru
    Arora, Siddhant
    Watanabe, Shinji
    INTERSPEECH 2024, 2024, : 2900 - 2904
  • [10] TNPNet: An approach to Few-shot open-set recognition via contextual transductive learning
    Wu, Shaoling
    Luo, Huilan
    Lin, Xiaoming
    NEUROCOMPUTING, 2025, 621