META-LEARNING FOR IMPROVING RARE WORD RECOGNITION IN END-TO-END ASR

被引:4
|
作者
Lux, Florian [1 ]
Ngoc Thang Vu [1 ]
机构
[1] Univ Stuttgart, Inst Nat Language Proc, D-70569 Stuttgart, Germany
关键词
meta learning; keyword spotting; speech recognition; speech embedding;
D O I
10.1109/ICASSP39728.2021.9414298
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this work we take on the challenge of rare word recognition in end-to-end (E2E) automatic speech recognition (ASR) by integrating a meta learning mechanism into an E2E ASR system, enabling few-shot adaptation. We propose a novel method of generating embeddings for speech, changes to four meta learning approaches, enabling them to perform keyword spotting and an approach to using their outcomes in an E2E ASR system. We verify the functionality of each of our three contributions in two experiments exploring their performance for different amounts of classes (N-way) and examples per class (k-shot) in a few-shot setting. We find that the information encoded in the speech embeddings suffices to allow the modified meta learning approaches to perform continuous signal spotting. Despite the simplicity of the interface between keyword spotting and speech recognition, we are able to consistently improve word error rate by up to 5%.
引用
收藏
页码:5974 / 5978
页数:5
相关论文
共 50 条
  • [31] Contrastive Learning for improving End-to-end Speaker Verification
    Tang, Yanxi
    Wang, Jianzong
    Qu, Xiaoyang
    Xiao, Jing
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [32] Improving End-to-End Models for Children's Speech Recognition
    Patel, Tanvina
    Scharenborg, Odette
    APPLIED SCIENCES-BASEL, 2024, 14 (06):
  • [33] IMPROVING VOICE SEPARATION BY INCORPORATING END-TO-END SPEECH RECOGNITION
    Takahashi, Naoya
    Singh, Mayank Kumar
    Basak, Sakya
    Sudarsanam, Parthasaarathy
    Ganapathy, Sriram
    Mitsufuji, Yuki
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 41 - 45
  • [34] End-to-End Deep Learning for Driver Distraction Recognition
    Koesdwiady, Arief
    Bedawi, Safaa M.
    Ou, Chaojie
    Karray, Fakhri
    IMAGE ANALYSIS AND RECOGNITION, ICIAR 2017, 2017, 10317 : 11 - 18
  • [35] Towards end-to-end speech recognition with transfer learning
    Chu-Xiong Qin
    Dan Qu
    Lian-Hai Zhang
    EURASIP Journal on Audio, Speech, and Music Processing, 2018
  • [36] An end-to-end face recognition method with alignment learning
    Tang, Fenggao
    Wu, Xuedong
    Zhu, Zhiyu
    Wan, Zhengang
    Chang, Yanchao
    Du, Zhaoping
    Gu, Lili
    OPTIK, 2020, 205
  • [37] INCREMENTAL LEARNING FOR END-TO-END AUTOMATIC SPEECH RECOGNITION
    Fu, Li
    Li, Xiaoxiao
    Zi, Libo
    Zhang, Zhengchen
    Wu, Youzheng
    He, Xiaodong
    Zhou, Bowen
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 320 - 327
  • [38] Intelligibility prediction of enhanced speech using recognition accuracy of end-to-end ASR systems
    Arai, Kenichi
    Ogawa, Atsunori
    Araki, Shoko
    Kinoshita, Keisuke
    Nakatani, Tomohiro
    Kamo, Naoyuki
    Irino, Toshio
    PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 1583 - 1589
  • [39] Towards end-to-end speech recognition with transfer learning
    Qin, Chu-Xiong
    Qu, Dan
    Zhang, Lian-Hai
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2018,
  • [40] IMPROVING RNN TRANSDUCER MODELING FOR END-TO-END SPEECH RECOGNITION
    Li, Jinyu
    Zhao, Rui
    Hu, Hu
    Gong, Yifan
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 114 - 121