Improving the Training Recipe for a Robust Conformer-based Hybrid Model

被引：2

作者：

Zeineldeen, Mohammad ^{[1
,2
]}

Xu, Jingjing ^{[1
]}

Luescher, Christoph ^{[1
,2
]}

Schlueter, Ralf ^{[1
,2
]}

Ney, Hermann ^{[1
,2
]}

机构：

[1] Rhein Westfal TH Aachen, Dept Comp Sci, Human Language Technol & Pattern Recognit, D-52074 Aachen, Germany

[2] AppTek GmbH, D-52062 Aachen, Germany

来源：

INTERSPEECH 2022 | 2022年

关键词：

speech recognition; conformer acoustic model; speaker adaptation; NEURAL-NETWORKS; SPEAKER;

D O I：

10.21437/Interspeech.2022-10723

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Speaker adaptation is important to build robust automatic speech recognition (ASR) systems. In this work, we investigate various methods for speaker adaptive training (SAT) based on feature-space approaches for a conformer-based acoustic model (AM) on the Switchboard 300h dataset. We propose a method, called Weighted-Simple-Add, which adds weighted speaker information vectors to the input of the multi-head self-attention module of the conformer AM. Using this method for SAT, we achieve 3.5% and 4.5% relative improvement in terms of WER on the CallHome part of Hub5'00 and Hub5'01 respectively. Moreover, we build on top of our previous work where we proposed a novel and competitive training recipe for a conformerbased hybrid AM. We extend and improve this recipe where we achieve 11% relative improvement in terms of word-error-rate (WER) on Switchboard 300h Hub5'00 dataset. We also make this recipe efficient by reducing the total number of parameters by 34% relative.

引用

页码：1036 / 1040

页数：5

共 50 条

[31] Enhancing Conformer-Based Sound Event Detection Using Frequency Dynamic Convolutions and BEATs Audio Embeddings
Barahona, Sara
de Benito-Gorron, Diego
Toledano, Doroteo T.
Ramos, Daniel
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 3896 - 3907
[32] Uni2Mul: A Conformer-Based Multimodal Emotion Classification Model by Considering Unimodal Expression Differences with Multi-Task Learning
Zhang, Lihong
Liu, Chaolong
Jia, Nan
APPLIED SCIENCES-BASEL, 2023, 13 (17):
[33] ConfRank: Improving GFN-FF Conformer Ranking with Pairwise Training
Hoelzer, Christian
Oerder, Rick
Grimme, Stefan
Hamaekers, Jan
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2024, 64 (23) : 8909 - 8925
[34] Speech synthesis from three-axis accelerometer signals using conformer-based deep neural network
Kwon, Jinuk
Hwang, Jihun
Sung, Jee Eun
Im, Chang-Hwan
Computers in Biology and Medicine, 2024, 182
[35] A Korean menu-ordering sentence text-to-speech system using conformer-based FastSpeech2
Choi, Yerin
Jang, JaeHoo
Koo, Myoung-Wan
JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2022, 41 (03): : 359 - 366
[36] Synthesis of a calix[4]crown-6 cone conformer-based oligomer and its metal cation extraction abilities
Yilmaz, Aydan
Yilmaz, Mustafa
Bartsch, Richard A.
JOURNAL OF MACROMOLECULAR SCIENCE PART A-PURE AND APPLIED CHEMISTRY, 2006, 43 (4-5): : 637 - 645
[37] A Hybrid Model of Conformer and LSTM for Ocean Wave Height Prediction
Xiao, Jiawei
Lu, Peng
APPLIED SCIENCES-BASEL, 2024, 14 (14):
[38] Multi-Stream Conformer-Based User Identification System Using 2D CQT Spectrogram Tailored to Multiple Biosignals
Myeong Kim, Jae
Su Kim, Jin
Ho Song, Cheol
Pan, Sungbum
IEEE ACCESS, 2024, 12 : 117102 - 117109
[39] A model for improving the performance of feature extraction based robust hashing
McCarthy, EP
Balado, F
Silvestre, GCM
Hurley, NJ
Security, Steganography, and Watermarking of Multimedia Contents VII, 2005, 5681 : 59 - 67
[40] Conformer-Based Dental AI Patient Clinical Diagnosis Simulation Using Korean Synthetic Data Generator for Multiple Standardized Patient Scenarios
Kim, Kangmin
Chun, Chanjun
Moon, Seong-Yong
BIOENGINEERING-BASEL, 2023, 10 (05):

← 1 2 3 4 5 →