Dealing with Unknowns in Continual Learning for End-to-end Automatic Speech Recognition

被引：3

作者：

Sustek, Martin ^{[1
,2
]}

Sadhu, Samik ^{[2
]}

Hermansky, Hynek ^{[1
,2
,3
]}

机构：

[1] Brno Univ Technol, Brno, Czech Republic

[2] Johns Hopkins Univ, Ctr Language & Speech Proc, Baltimore, MD 21218 USA

[3] Johns Hopkins Univ, Human Language Technol Ctr Excellence, Baltimore, MD 21218 USA

来源：

INTERSPEECH 2022 | 2022年

关键词：

speech recognition; continual learning; multi stream speech recognition; ENVIRONMENT;

D O I：

10.21437/Interspeech.2022-11139

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Learning continually from data is a task executed effortlessly by humans but remains to be of significant challenge for machines. Moreover, when encountering unknown test scenarios machines fail to generalize. We propose a mathematically motivated dynamically expanding end-to-end model of independent sequence-to-sequence components trained on different data sets that avoid catastrophically forgetting knowledge acquired from previously seen data while seamlessly integrating knowledge from new data. During inference, the likelihoods of the unknown test scenario are computed using internal model activation distributions. The inference made by each independent component is weighted by the normalized likelihood values to obtain the final decision.

引用

页码：1046 / 1050

页数：5

共 50 条

[41] End-to-End Deep Learning Speech Recognition Model for Silent Speech Challenge
Kimura, Naoki
Su, Zixiong
Saeki, Takaaki
INTERSPEECH 2020, 2020, : 1025 - 1026
[42] An End-to-End Dialect Identification System with Transfer Learning from a Multilingual Automatic Speech Recognition Model
Wang, Ding
Ye, Shuaishuai
Hu, Xinhui
Li, Sheng
Xu, Xinkang
INTERSPEECH 2021, 2021, : 3266 - 3270
[43] End-to-End Rich Transcription-Style Automatic Speech Recognition with Semi-Supervised Learning
Tanaka, Tomohiro
Masumura, Ryo
Ihori, Mana
Takashima, Akihiko
Orihashi, Shota
Makishima, Naoki
INTERSPEECH 2021, 2021, : 4458 - 4462
[44] IMPROVING UNSUPERVISED STYLE TRANSFER IN END-TO-END SPEECH SYNTHESIS WITH END-TO-END SPEECH RECOGNITION
Liu, Da-Rong
Yang, Chi-Yu
Wu, Szu-Lin
Lee, Hung-Yi
2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 640 - 647
[45] SFA: Searching faster architectures for end-to-end automatic speech recognition models
Liu, Yukun
Li, Ta
Zhang, Pengyuan
Yan, Yonghong
COMPUTER SPEECH AND LANGUAGE, 2023, 81
[46] AN END-TO-END APPROACH TO JOINT SOCIAL SIGNAL DETECTION AND AUTOMATIC SPEECH RECOGNITION
Inaguma, Hirofumi
Mimura, Masato
Inoue, Koji
Yoshii, Kazuyoshi
Kawahara, Tatsuya
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 6214 - 6218
[47] Improved Training for End-to-End Streaming Automatic Speech Recognition Model with Punctuation
Kim, Hanbyul
Seo, Seunghyun
Lee, Lukas
Baek, Seolki
INTERSPEECH 2023, 2023, : 1653 - 1657
[48] Hardware Accelerator for Transformer based End-to-End Automatic Speech Recognition System
Yamini, Shaarada D.
Mirishkar, Ganesh S.
Vuppala, Anil Kumar
Purini, Suresh
2023 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS, IPDPSW, 2023, : 93 - 100
[49] Online Hybrid CTC/Attention End-to-End Automatic Speech Recognition Architecture
Miao, Haoran
Cheng, Gaofeng
Zhang, Pengyuan
Yan, Yonghong
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 1452 - 1465
[50] AUDITORY-BASED DATA AUGMENTATION FOR END-TO-END AUTOMATIC SPEECH RECOGNITION
Tu, Zehai
Deadman, Jack
Ma, Ning
Barker, Jon
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7447 - 7451

← 1 2 3 4 5 →