Improving the Training Recipe for a Robust Conformer-based Hybrid Model

被引：2

作者：

Zeineldeen, Mohammad ^{[1
,2
]}

Xu, Jingjing ^{[1
]}

Luescher, Christoph ^{[1
,2
]}

Schlueter, Ralf ^{[1
,2
]}

Ney, Hermann ^{[1
,2
]}

机构：

[1] Rhein Westfal TH Aachen, Dept Comp Sci, Human Language Technol & Pattern Recognit, D-52074 Aachen, Germany

[2] AppTek GmbH, D-52062 Aachen, Germany

来源：

INTERSPEECH 2022 | 2022年

关键词：

speech recognition; conformer acoustic model; speaker adaptation; NEURAL-NETWORKS; SPEAKER;

D O I：

10.21437/Interspeech.2022-10723

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Speaker adaptation is important to build robust automatic speech recognition (ASR) systems. In this work, we investigate various methods for speaker adaptive training (SAT) based on feature-space approaches for a conformer-based acoustic model (AM) on the Switchboard 300h dataset. We propose a method, called Weighted-Simple-Add, which adds weighted speaker information vectors to the input of the multi-head self-attention module of the conformer AM. Using this method for SAT, we achieve 3.5% and 4.5% relative improvement in terms of WER on the CallHome part of Hub5'00 and Hub5'01 respectively. Moreover, we build on top of our previous work where we proposed a novel and competitive training recipe for a conformerbased hybrid AM. We extend and improve this recipe where we achieve 11% relative improvement in terms of word-error-rate (WER) on Switchboard 300h Hub5'00 dataset. We also make this recipe efficient by reducing the total number of parameters by 34% relative.

引用

页码：1036 / 1040

页数：5

共 50 条

[21] EAD-CONFORMER: A CONFORMER-BASED ENCODER-ATTENTION-DECODER-NETWORK FOR MULTI-TASK AUDIO SOURCE SEPARATION
Li, Chenxing
Wang, Yang
Deng, Feng
Zhang, Zhuo
Wang, Xiaorui
Wang, Zhongyuan
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 521 - 525
[22] A CONFORMER-BASED ASR FRONTEND FOR JOINT ACOUSTIC ECHO CANCELLATION, SPEECH ENHANCEMENT AND SPEECH SEPARATION
O'Malley, Tom
Narayanan, Arun
Wang, Quan
Park, Alex
Walker, James
Howard, Nathan
2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 304 - 311
[23] CDPNet: conformer-based dual path joint modeling network for bird sound recognition
Guo, Huimin
Jian, Haifang
Wang, Yiyu
Wang, Hongchang
Cheng, Qinghua
Zheng, Shuaikang
Li, Yuehao
APPLIED INTELLIGENCE, 2024, 54 (04) : 3152 - 3168
[24] CDPNet: conformer-based dual path joint modeling network for bird sound recognition
Huimin Guo
Haifang Jian
Yiyu Wang
Hongchang Wang
Shuaikang Zheng
Qinghua Cheng
Yuehao Li
Applied Intelligence, 2024, 54 : 3152 - 3168
[25] Universal and accent-discriminative encoders for conformer-based accent-invariant speech recognition
Wang X.
Long Y.
Xu D.
International Journal of Speech Technology, 2022, 25 (4) : 987 - 995
[26] Blueprint Separable Subsampling and Aggregate Feature Conformer-Based End-to-End Neural Diarization
Jiao, Xiaolin
Chen, Yaqi
Qu, Dan
Yang, Xukui
ELECTRONICS, 2023, 12 (19)
[27] SPEAKER-CONDITIONING SINGLE-CHANNEL TARGET SPEAKER EXTRACTION USING CONFORMER-BASED ARCHITECTURES
Sinha, Ragini
Tammen, Marvin
Rollwage, Christian
Doclo, Simon
2022 INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC 2022), 2022,
[28] Experimental testing of flexible recipe control based on a hybrid model
J. Stefan Institute, Jamova 39, 1000, Ljubljana, Slovenia
不详
不详
Control Eng. Pract., 10 (1191-1208):
[29] Experimental testing of flexible recipe control based on a hybrid model
Sel, D
Hvala, N
Strmcnik, S
Milanic, S
Suk-Lubej, B
CONTROL ENGINEERING PRACTICE, 1999, 7 (10) : 1191 - 1208
[30] CONFORMER-BASED ON-DEVICE STREAMING SPEECH RECOGNITION WITH KD COMPRESSION AND TWO-PASS ARCHITECTURE
Park, Jinhwan
Jin, Sichen
Park, Junmo
Kim, Sungsoo
Sandhyana, Dhairya
Lee, Changheon
Han, Myoungji
Lee, Jungin
Jung, Seokyeong
Han, Changwoo
Kim, Chanwoo
2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 92 - 99

← 1 2 3 4 5 →