Improving the Training Recipe for a Robust Conformer-based Hybrid Model

被引:2
|
作者
Zeineldeen, Mohammad [1 ,2 ]
Xu, Jingjing [1 ]
Luescher, Christoph [1 ,2 ]
Schlueter, Ralf [1 ,2 ]
Ney, Hermann [1 ,2 ]
机构
[1] Rhein Westfal TH Aachen, Dept Comp Sci, Human Language Technol & Pattern Recognit, D-52074 Aachen, Germany
[2] AppTek GmbH, D-52062 Aachen, Germany
来源
关键词
speech recognition; conformer acoustic model; speaker adaptation; NEURAL-NETWORKS; SPEAKER;
D O I
10.21437/Interspeech.2022-10723
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speaker adaptation is important to build robust automatic speech recognition (ASR) systems. In this work, we investigate various methods for speaker adaptive training (SAT) based on feature-space approaches for a conformer-based acoustic model (AM) on the Switchboard 300h dataset. We propose a method, called Weighted-Simple-Add, which adds weighted speaker information vectors to the input of the multi-head self-attention module of the conformer AM. Using this method for SAT, we achieve 3.5% and 4.5% relative improvement in terms of WER on the CallHome part of Hub5'00 and Hub5'01 respectively. Moreover, we build on top of our previous work where we proposed a novel and competitive training recipe for a conformerbased hybrid AM. We extend and improve this recipe where we achieve 11% relative improvement in terms of word-error-rate (WER) on Switchboard 300h Hub5'00 dataset. We also make this recipe efficient by reducing the total number of parameters by 34% relative.
引用
收藏
页码:1036 / 1040
页数:5
相关论文
共 50 条
  • [1] A Robust Conformer-Based Speech Recognition Model for Mandarin Air Traffic Control
    Jiang, Peiyuan
    Pan, Weijun
    Zhang, Jian
    Wang, Teng
    Huang, Junxiang
    CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 77 (01): : 911 - 940
  • [2] CONFORMER-BASED HYBRID ASR SYSTEM FOR SWITCHBOARD DATASET
    Zeineldeen, Mohammad
    Xu, Jingjing
    Luescher, Christoph
    Michel, Wilfried
    Gerstenberger, Alexander
    Schlueter, Ralf
    Ney, Hermann
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7437 - 7441
  • [3] Efficient Conformer-Based CTC Model for Intelligent Cockpit Speech Recognition
    Guo, Hanzhi
    Chen, Yunshu
    Xie, Xukang
    Xu, Gaopeng
    Guo, Wei
    2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2022, : 522 - 526
  • [4] CMGAN: Conformer-based Metric GAN for Speech Enhancement
    Cao, Ruizhe
    Abdulatif, Sherif
    Yang, Bin
    INTERSPEECH 2022, 2022, : 936 - 940
  • [5] Conformer-Based Lip-Reading for Japanese Sentence
    Arakane, Taiki
    Saitoh, Takeshi
    Chiba, Ryuuichi
    Morise, Masanori
    Oda, Yasuo
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2023, 13836 LNCS : 474 - 485
  • [6] An Improvement to Conformer-Based Model for High-Accuracy Speech Feature Extraction and Learning
    Liu, Mengzhuo
    Wei, Yangjie
    ENTROPY, 2022, 24 (07)
  • [7] Conformer-Based Speaker Recognition Model for Real-Time Multi-Scenarios
    Xuan, Xi
    Han, Runping
    Gao, Jingxin
    Computer Engineering and Applications, 2024, 60 (07) : 147 - 156
  • [8] Efficient conformer-based speech recognition with linear attention
    Li, Shengqiang
    Xu, Menglong
    Zhang, Xiao-Lei
    2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 448 - 453
  • [9] Modular Domain Adaptation for Conformer-Based Streaming ASR
    Li, Qiujia
    Li, Bo
    Hwang, Dongseong
    Sainath, Tara N.
    Mengibar, Pedro M.
    INTERSPEECH 2023, 2023, : 3357 - 3361
  • [10] CATAD: Conformer-Based Adversarial Training with Adaptive Diffusion for Bone-Conducted Speech Enhancement
    Duan, Zhiqiang
    Zhou, Jian
    Fan, Cunhang
    Tao, Liang
    Lv, Zhao
    2024 IEEE 14TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, ISCSLP 2024, 2024, : 159 - 163