BAYESIAN TRANSFORMER LANGUAGE MODELS FOR SPEECH RECOGNITION

被引:12
|
作者
Xue, Boyang [1 ]
Yu, Jianwei [1 ]
Xu, Junhao [1 ]
Liu, Shansong [1 ]
Hu, Shoukang [1 ]
Ye, Zi [1 ]
Geng, Mengzhe [1 ]
Liu, Xunying [1 ]
Meng, Helen [1 ]
机构
[1] Chinese Univ Hong Kong, Hong Kong, Peoples R China
关键词
neural language models; Transformer; Bayesian learning; model uncertainty; speech recognition;
D O I
10.1109/ICASSP39728.2021.9414046
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
State-of-the-art neural language models (LMs) represented by Transformers are highly complex. Their use of fixed, deterministic parameter estimates fail to account for model uncertainty and lead to over-fitting and poor generalization when given limited training data. In order to address these issues, this paper proposes a full Bayesian learning framework for Transformer LM estimation. Efficient variational inference based approaches are used to estimate the latent parameter posterior distributions associated with different parts of the Transformer model architecture including multi-head self-attention, feed forward and embedding layers. Statistically significant word error rate (WER) reductions up to 0.5% absolute (3.18% relative) and consistent perplexity gains were obtained over the baseline Transformer LMs on state-of-the-art Switchboard corpus trained LF-MMI factored TDNN systems with i-Vector speaker adaptation. Performance improvements were also obtained on a cross domain LM adaptation task requiring porting a Transformer LM trained on the Switchboard and Fisher data to a low-resource DementiaBank elderly speech corpus.
引用
收藏
页码:7378 / 7382
页数:5
相关论文
共 50 条
  • [1] Hierarchical Bayesian Language Models for Conversational Speech Recognition
    Huang, Songfang
    Renals, Steve
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (08): : 1941 - 1954
  • [2] MIXED PRECISION QUANTIZATION OF TRANSFORMER LANGUAGE MODELS FOR SPEECH RECOGNITION
    Xu, Junhao
    Hu, Shoukang
    Yu, Jianwei
    Liu, Xunying
    Meng, Helen
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7383 - 7387
  • [3] BAYESIAN LATENT VARIABLE MODELS FOR SPEECH RECOGNITION
    Chien, Jen-Tzung
    Liu, Peng
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7393 - 7397
  • [4] Bayesian Neural Network Language Modeling for Speech Recognition
    Xue, Boyang
    Hu, Shoukang
    Xu, Junhao
    Geng, Mengzhe
    Liu, Xunying
    Meng, Helen
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 2900 - 2917
  • [5] Gaussian mixture language models for speech recognition
    Afify, Mohamed
    Siohan, Olivier
    Sarikaya, Ruhi
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 29 - +
  • [6] Improving language models for radiology speech recognition
    Paulett, John M.
    Langlotz, Curtis P.
    JOURNAL OF BIOMEDICAL INFORMATICS, 2009, 42 (01) : 53 - 58
  • [7] Language Models for Tamil Speech Recognition System
    Saraswathi, S.
    Geetha, T. V.
    IETE TECHNICAL REVIEW, 2007, 24 (05) : 375 - 383
  • [8] Discriminative training of language models for speech recognition
    Kuo, KHJ
    Fosler-Lussier, E
    Jiang, H
    Lee, CH
    2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 325 - 328
  • [9] GEOGRAPHIC LANGUAGE MODELS FOR AUTOMATIC SPEECH RECOGNITION
    Xiao, Xiaoqiang
    Chen, Hong
    Zylak, Mark
    Sosa, Daniela
    Desu, Suma
    Krishnamoorthy, Mahesh
    Liu, Daben
    Paulik, Matthias
    Zhang, Yuchen
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 6124 - 6128
  • [10] Dirichlet Class Language Models for Speech Recognition
    Chien, Jen-Tzung
    Chueh, Chuang-Hua
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (03): : 482 - 495