META-LEARNING FOR IMPROVING RARE WORD RECOGNITION IN END-TO-END ASR

被引：4

作者：

Lux, Florian ^{[1
]}

Ngoc Thang Vu ^{[1
]}

机构：

[1] Univ Stuttgart, Inst Nat Language Proc, D-70569 Stuttgart, Germany

来源：

2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021) | 2021年

关键词：

meta learning; keyword spotting; speech recognition; speech embedding;

D O I：

10.1109/ICASSP39728.2021.9414298

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In this work we take on the challenge of rare word recognition in end-to-end (E2E) automatic speech recognition (ASR) by integrating a meta learning mechanism into an E2E ASR system, enabling few-shot adaptation. We propose a novel method of generating embeddings for speech, changes to four meta learning approaches, enabling them to perform keyword spotting and an approach to using their outcomes in an E2E ASR system. We verify the functionality of each of our three contributions in two experiments exploring their performance for different amounts of classes (N-way) and examples per class (k-shot) in a few-shot setting. We find that the information encoded in the speech embeddings suffices to allow the modified meta learning approaches to perform continuous signal spotting. Despite the simplicity of the interface between keyword spotting and speech recognition, we are able to consistently improve word error rate by up to 5%.

引用

页码：5974 / 5978

页数：5

共 50 条

[31] Contrastive Learning for improving End-to-end Speaker Verification
Tang, Yanxi
Wang, Jianzong
Qu, Xiaoyang
Xiao, Jing
2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
[32] Improving End-to-End Models for Children's Speech Recognition
Patel, Tanvina
Scharenborg, Odette
APPLIED SCIENCES-BASEL, 2024, 14 (06):
[33] IMPROVING VOICE SEPARATION BY INCORPORATING END-TO-END SPEECH RECOGNITION
Takahashi, Naoya
Singh, Mayank Kumar
Basak, Sakya
Sudarsanam, Parthasaarathy
Ganapathy, Sriram
Mitsufuji, Yuki
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 41 - 45
[34] End-to-End Deep Learning for Driver Distraction Recognition
Koesdwiady, Arief
Bedawi, Safaa M.
Ou, Chaojie
Karray, Fakhri
IMAGE ANALYSIS AND RECOGNITION, ICIAR 2017, 2017, 10317 : 11 - 18
[35] Towards end-to-end speech recognition with transfer learning
Chu-Xiong Qin
Dan Qu
Lian-Hai Zhang
EURASIP Journal on Audio, Speech, and Music Processing, 2018
[36] An end-to-end face recognition method with alignment learning
Tang, Fenggao
Wu, Xuedong
Zhu, Zhiyu
Wan, Zhengang
Chang, Yanchao
Du, Zhaoping
Gu, Lili
OPTIK, 2020, 205
[37] INCREMENTAL LEARNING FOR END-TO-END AUTOMATIC SPEECH RECOGNITION
Fu, Li
Li, Xiaoxiao
Zi, Libo
Zhang, Zhengchen
Wu, Youzheng
He, Xiaodong
Zhou, Bowen
2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 320 - 327
[38] Intelligibility prediction of enhanced speech using recognition accuracy of end-to-end ASR systems
Arai, Kenichi
Ogawa, Atsunori
Araki, Shoko
Kinoshita, Keisuke
Nakatani, Tomohiro
Kamo, Naoyuki
Irino, Toshio
PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 1583 - 1589
[39] Towards end-to-end speech recognition with transfer learning
Qin, Chu-Xiong
Qu, Dan
Zhang, Lian-Hai
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2018,
[40] IMPROVING RNN TRANSDUCER MODELING FOR END-TO-END SPEECH RECOGNITION
Li, Jinyu
Zhao, Rui
Hu, Hu
Gong, Yifan
2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 114 - 121

← 1 2 3 4 5 →