On multi-domain training and adaptation of end-to-end RNN acoustic models for distant speech recognition

被引：13

作者：

Mirsamadi, Seyedmandad ^{[1
]}

Hansen, John H. L. ^{[1
]}

机构：

[1] Univ Texas Dallas, CRSS, Richardson, TX 75080 USA

来源：

18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION | 2017年

关键词：

distant speech recognition; recurrent neural network; multi-domain training;

D O I：

10.21437/Interspeech.2017-398

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recognition of distant (far-field) speech is a challenge for ASR due to mismatch in recording conditions resulting from room reverberation and environment noise. Given the remarkable learning capacity of deep neural networks, there is increasing interest to address this problem by using a large corpus of reverberant far-field speech to train robust models. In this study. we explore how an end-to-end RNN acoustic model trained on speech from different rooms and acoustic conditions (different domains) achieves robustness to environmental variations. It is shown that the first hidden layer acts as a domain separator, projecting the data from different domains into different sub-spaces. The subsequent layers then use this encoded domain knowledge to map these features to final representations that are invariant to domain change. This mechanism is closely related to noise-aware or room-aware approaches which append manually-extracted domain signatures to the input features. Additionaly, we demonstrate how this understanding of the learning procedure provides useful guidance for model adaptation to new acoustic conditions. We present results based on AMI corpus to demonstrate the propagation of domain information in a deep RNN, and perform recognition experiments which indicate the role of encoded domain knowledge on training and adaptation of RNN acoustic models.

引用

页码：404 / 408

页数：5

共 50 条

[41] CYCLE-CONSISTENCY TRAINING FOR END-TO-END SPEECH RECOGNITION
Hori, Takaaki
Astudillo, Ramon
Hayashi, Tomoki
Zhang, Yu
Watanabe, Shinji
Le Roux, Jonathan
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6271 - 6275
[42] Multitask Training with Text Data for End-to-End Speech Recognition
Wang, Peidong
Sainath, Tara N.
Weiss, Ron J.
INTERSPEECH 2021, 2021, : 2566 - 2570
[43] Serialized Output Training for End-to-End Overlapped Speech Recognition
Kanda, Naoyuki
Gaur, Yashesh
Wang, Xiaofei
Meng, Zhong
Yoshioka, Takuya
INTERSPEECH 2020, 2020, : 2797 - 2801
[44] Streaming End-to-End Speech Recognition for Hybrid RNN-T/Attention Architecture
Moriya, Takafumi
Tanaka, Tomohiro
Ashihara, Takanori
Ochiai, Tsubasa
Sato, Hiroshi
Ando, Atsushi
Masumura, Ryo
Delcroix, Marc
Asami, Taichi
INTERSPEECH 2021, 2021, : 1787 - 1791
[45] Knowledge Distillation from Offline to Streaming RNN Transducer for End-to-end Speech Recognition
Kurata, Gakuto
Saon, George
INTERSPEECH 2020, 2020, : 2117 - 2121
[46] Multi-domain Knowledge Distillation via Uncertainty-Matching for End-to-End ASR Models
Kim, Ho-Gyeong
Lee, Min-Joong
Lee, Hoshik
Kang, Tae Gyoon
Lee, Jihyun
Yang, Eunho
Hwang, Sung Ju
INTERSPEECH 2021, 2021, : 2531 - 2535
[47] Domain Expansion for End-to-End Speech Recognition: Applications for Accent/Dialect Speech
Ghorbani, Shahram
Hansen, John H. L.
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 762 - 774
[48] IMPROVING HYBRID CTC/ATTENTION END-TO-END SPEECH RECOGNITION WITH PRETRAINED ACOUSTIC AND LANGUAGE MODELS
Deng, Keqi
Cao, Songjun
Zhang, Yike
Ma, Long
2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 76 - 82
[49] Layer-wise Fast Adaptation for End-to-End Multi-Accent Speech Recognition
Gong, Xun
Lu, Yizhou
Zhou, Zhikai
Qian, Yanmin
INTERSPEECH 2021, 2021, : 1274 - 1278
[50] Improving End-to-End Models for Children's Speech Recognition
Patel, Tanvina
Scharenborg, Odette
APPLIED SCIENCES-BASEL, 2024, 14 (06):

← 1 2 3 4 5 →