Improving the Training Recipe for a Robust Conformer-based Hybrid Model

被引:2
|
作者
Zeineldeen, Mohammad [1 ,2 ]
Xu, Jingjing [1 ]
Luescher, Christoph [1 ,2 ]
Schlueter, Ralf [1 ,2 ]
Ney, Hermann [1 ,2 ]
机构
[1] Rhein Westfal TH Aachen, Dept Comp Sci, Human Language Technol & Pattern Recognit, D-52074 Aachen, Germany
[2] AppTek GmbH, D-52062 Aachen, Germany
来源
关键词
speech recognition; conformer acoustic model; speaker adaptation; NEURAL-NETWORKS; SPEAKER;
D O I
10.21437/Interspeech.2022-10723
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speaker adaptation is important to build robust automatic speech recognition (ASR) systems. In this work, we investigate various methods for speaker adaptive training (SAT) based on feature-space approaches for a conformer-based acoustic model (AM) on the Switchboard 300h dataset. We propose a method, called Weighted-Simple-Add, which adds weighted speaker information vectors to the input of the multi-head self-attention module of the conformer AM. Using this method for SAT, we achieve 3.5% and 4.5% relative improvement in terms of WER on the CallHome part of Hub5'00 and Hub5'01 respectively. Moreover, we build on top of our previous work where we proposed a novel and competitive training recipe for a conformerbased hybrid AM. We extend and improve this recipe where we achieve 11% relative improvement in terms of word-error-rate (WER) on Switchboard 300h Hub5'00 dataset. We also make this recipe efficient by reducing the total number of parameters by 34% relative.
引用
收藏
页码:1036 / 1040
页数:5
相关论文
共 50 条
  • [21] EAD-CONFORMER: A CONFORMER-BASED ENCODER-ATTENTION-DECODER-NETWORK FOR MULTI-TASK AUDIO SOURCE SEPARATION
    Li, Chenxing
    Wang, Yang
    Deng, Feng
    Zhang, Zhuo
    Wang, Xiaorui
    Wang, Zhongyuan
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 521 - 525
  • [22] A CONFORMER-BASED ASR FRONTEND FOR JOINT ACOUSTIC ECHO CANCELLATION, SPEECH ENHANCEMENT AND SPEECH SEPARATION
    O'Malley, Tom
    Narayanan, Arun
    Wang, Quan
    Park, Alex
    Walker, James
    Howard, Nathan
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 304 - 311
  • [23] CDPNet: conformer-based dual path joint modeling network for bird sound recognition
    Guo, Huimin
    Jian, Haifang
    Wang, Yiyu
    Wang, Hongchang
    Cheng, Qinghua
    Zheng, Shuaikang
    Li, Yuehao
    APPLIED INTELLIGENCE, 2024, 54 (04) : 3152 - 3168
  • [24] CDPNet: conformer-based dual path joint modeling network for bird sound recognition
    Huimin Guo
    Haifang Jian
    Yiyu Wang
    Hongchang Wang
    Shuaikang Zheng
    Qinghua Cheng
    Yuehao Li
    Applied Intelligence, 2024, 54 : 3152 - 3168
  • [25] Universal and accent-discriminative encoders for conformer-based accent-invariant speech recognition
    Wang X.
    Long Y.
    Xu D.
    International Journal of Speech Technology, 2022, 25 (4) : 987 - 995
  • [26] Blueprint Separable Subsampling and Aggregate Feature Conformer-Based End-to-End Neural Diarization
    Jiao, Xiaolin
    Chen, Yaqi
    Qu, Dan
    Yang, Xukui
    ELECTRONICS, 2023, 12 (19)
  • [27] SPEAKER-CONDITIONING SINGLE-CHANNEL TARGET SPEAKER EXTRACTION USING CONFORMER-BASED ARCHITECTURES
    Sinha, Ragini
    Tammen, Marvin
    Rollwage, Christian
    Doclo, Simon
    2022 INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC 2022), 2022,
  • [28] Experimental testing of flexible recipe control based on a hybrid model
    J. Stefan Institute, Jamova 39, 1000, Ljubljana, Slovenia
    不详
    不详
    Control Eng. Pract., 10 (1191-1208):
  • [29] Experimental testing of flexible recipe control based on a hybrid model
    Sel, D
    Hvala, N
    Strmcnik, S
    Milanic, S
    Suk-Lubej, B
    CONTROL ENGINEERING PRACTICE, 1999, 7 (10) : 1191 - 1208
  • [30] CONFORMER-BASED ON-DEVICE STREAMING SPEECH RECOGNITION WITH KD COMPRESSION AND TWO-PASS ARCHITECTURE
    Park, Jinhwan
    Jin, Sichen
    Park, Junmo
    Kim, Sungsoo
    Sandhyana, Dhairya
    Lee, Changheon
    Han, Myoungji
    Lee, Jungin
    Jung, Seokyeong
    Han, Changwoo
    Kim, Chanwoo
    2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 92 - 99