META-LEARNING FOR IMPROVING RARE WORD RECOGNITION IN END-TO-END ASR

被引:4
|
作者
Lux, Florian [1 ]
Ngoc Thang Vu [1 ]
机构
[1] Univ Stuttgart, Inst Nat Language Proc, D-70569 Stuttgart, Germany
关键词
meta learning; keyword spotting; speech recognition; speech embedding;
D O I
10.1109/ICASSP39728.2021.9414298
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this work we take on the challenge of rare word recognition in end-to-end (E2E) automatic speech recognition (ASR) by integrating a meta learning mechanism into an E2E ASR system, enabling few-shot adaptation. We propose a novel method of generating embeddings for speech, changes to four meta learning approaches, enabling them to perform keyword spotting and an approach to using their outcomes in an E2E ASR system. We verify the functionality of each of our three contributions in two experiments exploring their performance for different amounts of classes (N-way) and examples per class (k-shot) in a few-shot setting. We find that the information encoded in the speech embeddings suffices to allow the modified meta learning approaches to perform continuous signal spotting. Despite the simplicity of the interface between keyword spotting and speech recognition, we are able to consistently improve word error rate by up to 5%.
引用
收藏
页码:5974 / 5978
页数:5
相关论文
共 50 条
  • [41] Improving Cross-Lingual Transfer Learning for End-to-End Speech Recognition with Speech Translation
    Wang, Changhan
    Pino, Juan
    Gu, Jiatao
    INTERSPEECH 2020, 2020, : 4731 - 4735
  • [42] ASR-AWARE END-TO-END NEURAL DIARIZATION
    Khare, Aparna
    Han, Eunjung
    Yang, Yuguang
    Stolcke, Andreas
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8092 - 8096
  • [43] End-to-End Speaker-Attributed ASR with Transformer
    Kanda, Naoyuki
    Ye, Guoli
    Gaur, Yashesh
    Wang, Xiaofei
    Meng, Zhong
    Chen, Zhuo
    Yoshioka, Takuya
    INTERSPEECH 2021, 2021, : 4413 - 4417
  • [44] SPEAKER AND LANGUAGE AWARE TRAINING FOR END-TO-END ASR
    Bansal, Shubham
    Malhotra, Karan
    Ganapathy, Sriram
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 494 - 501
  • [45] TOWARDS FAST AND ACCURATE STREAMING END-TO-END ASR
    Li, Bo
    Chang, Shuo-yiin
    Sainath, Tara N.
    Pang, Ruoming
    He, Yanzhang
    Strohman, Trevor
    Wu, Yonghui
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6069 - 6073
  • [46] INDEPENDENT LANGUAGE MODELING ARCHITECTURE FOR END-TO-END ASR
    Van Tung Pham
    Xu, Haihua
    Khassanov, Yerbolat
    Zeng, Zhiping
    Chng, Eng Siong
    Ni, Chongjia
    Ma, Bin
    Li, Haizhou
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7059 - 7063
  • [47] A BETTER AND FASTER END-TO-END MODEL FOR STREAMING ASR
    Li, Bo
    Gulati, Anmol
    Yu, Jiahui
    Sainath, Tara N.
    Chiu, Chung-Cheng
    Narayanan, Arun
    Chang, Shuo-Yiin
    Pang, Ruoming
    He, Yanzhang
    Qin, James
    Han, Wei
    Liang, Qiao
    Zhang, Yu
    Strohman, Trevor
    Wu, Yonghui
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5634 - 5638
  • [48] Improving low-resource Tibetan end-to-end ASR by multilingual and multilevel unit modeling
    Qin, Siqing
    Wang, Longbiao
    Li, Sheng
    Dang, Jianwu
    Pan, Lixin
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2022, 2022 (01)
  • [49] Improving low-resource Tibetan end-to-end ASR by multilingual and multilevel unit modeling
    Siqing Qin
    Longbiao Wang
    Sheng Li
    Jianwu Dang
    Lixin Pan
    EURASIP Journal on Audio, Speech, and Music Processing, 2022
  • [50] TRANSFER LEARNING OF LANGUAGE-INDEPENDENT END-TO-END ASR WITH LANGUAGE MODEL FUSION
    Inaguma, Hirofumi
    Cho, Jaejin
    Baskar, Murali Karthick
    Kawahara, Tatsuya
    Watanabe, Shinji
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6096 - 6100