Improving the Training Recipe for a Robust Conformer-based Hybrid Model

被引:2
|
作者
Zeineldeen, Mohammad [1 ,2 ]
Xu, Jingjing [1 ]
Luescher, Christoph [1 ,2 ]
Schlueter, Ralf [1 ,2 ]
Ney, Hermann [1 ,2 ]
机构
[1] Rhein Westfal TH Aachen, Dept Comp Sci, Human Language Technol & Pattern Recognit, D-52074 Aachen, Germany
[2] AppTek GmbH, D-52062 Aachen, Germany
来源
关键词
speech recognition; conformer acoustic model; speaker adaptation; NEURAL-NETWORKS; SPEAKER;
D O I
10.21437/Interspeech.2022-10723
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speaker adaptation is important to build robust automatic speech recognition (ASR) systems. In this work, we investigate various methods for speaker adaptive training (SAT) based on feature-space approaches for a conformer-based acoustic model (AM) on the Switchboard 300h dataset. We propose a method, called Weighted-Simple-Add, which adds weighted speaker information vectors to the input of the multi-head self-attention module of the conformer AM. Using this method for SAT, we achieve 3.5% and 4.5% relative improvement in terms of WER on the CallHome part of Hub5'00 and Hub5'01 respectively. Moreover, we build on top of our previous work where we proposed a novel and competitive training recipe for a conformerbased hybrid AM. We extend and improve this recipe where we achieve 11% relative improvement in terms of word-error-rate (WER) on Switchboard 300h Hub5'00 dataset. We also make this recipe efficient by reducing the total number of parameters by 34% relative.
引用
收藏
页码:1036 / 1040
页数:5
相关论文
共 50 条
  • [31] Enhancing Conformer-Based Sound Event Detection Using Frequency Dynamic Convolutions and BEATs Audio Embeddings
    Barahona, Sara
    de Benito-Gorron, Diego
    Toledano, Doroteo T.
    Ramos, Daniel
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 3896 - 3907
  • [32] Uni2Mul: A Conformer-Based Multimodal Emotion Classification Model by Considering Unimodal Expression Differences with Multi-Task Learning
    Zhang, Lihong
    Liu, Chaolong
    Jia, Nan
    APPLIED SCIENCES-BASEL, 2023, 13 (17):
  • [33] ConfRank: Improving GFN-FF Conformer Ranking with Pairwise Training
    Hoelzer, Christian
    Oerder, Rick
    Grimme, Stefan
    Hamaekers, Jan
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2024, 64 (23) : 8909 - 8925
  • [34] Speech synthesis from three-axis accelerometer signals using conformer-based deep neural network
    Kwon, Jinuk
    Hwang, Jihun
    Sung, Jee Eun
    Im, Chang-Hwan
    Computers in Biology and Medicine, 2024, 182
  • [35] A Korean menu-ordering sentence text-to-speech system using conformer-based FastSpeech2
    Choi, Yerin
    Jang, JaeHoo
    Koo, Myoung-Wan
    JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2022, 41 (03): : 359 - 366
  • [36] Synthesis of a calix[4]crown-6 cone conformer-based oligomer and its metal cation extraction abilities
    Yilmaz, Aydan
    Yilmaz, Mustafa
    Bartsch, Richard A.
    JOURNAL OF MACROMOLECULAR SCIENCE PART A-PURE AND APPLIED CHEMISTRY, 2006, 43 (4-5): : 637 - 645
  • [37] A Hybrid Model of Conformer and LSTM for Ocean Wave Height Prediction
    Xiao, Jiawei
    Lu, Peng
    APPLIED SCIENCES-BASEL, 2024, 14 (14):
  • [38] Multi-Stream Conformer-Based User Identification System Using 2D CQT Spectrogram Tailored to Multiple Biosignals
    Myeong Kim, Jae
    Su Kim, Jin
    Ho Song, Cheol
    Pan, Sungbum
    IEEE ACCESS, 2024, 12 : 117102 - 117109
  • [39] A model for improving the performance of feature extraction based robust hashing
    McCarthy, EP
    Balado, F
    Silvestre, GCM
    Hurley, NJ
    Security, Steganography, and Watermarking of Multimedia Contents VII, 2005, 5681 : 59 - 67
  • [40] Conformer-Based Dental AI Patient Clinical Diagnosis Simulation Using Korean Synthetic Data Generator for Multiple Standardized Patient Scenarios
    Kim, Kangmin
    Chun, Chanjun
    Moon, Seong-Yong
    BIOENGINEERING-BASEL, 2023, 10 (05):