Improving Language Modeling with an Adversarial Critic for Automatic Speech Recognition

被引:2
|
作者
Zhang, Yike [1 ,2 ]
Zhang, Pengyuan [1 ,2 ]
Yan, Yonghong [1 ,2 ,3 ]
机构
[1] Chinese Acad Sci, Inst Acoust, Beijing 100864, Peoples R China
[2] Univ Chinese Acad Sci, Beijing, Peoples R China
[3] Chinese Acad Sci, Xinjiang Tech Inst Phys & Chem, Urumqi, Peoples R China
基金
中国国家自然科学基金;
关键词
speech recognition; language modeling; generative adversarial networks; policy gradients;
D O I
10.21437/Interspeech.2018-1111
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recurrent neural network language models (RNN LMs) trained via the maximum likelihood principle suffer from the exposure bias problem in the inference stage. Therefore, potential recognition errors limit their performance on rescoring N-best lists of the speech recognition outputs. Inspired by the generative adversarial net (GAN), this paper proposes a novel approach to alleviate this problem. We regard the RNN LM as a generative model in the training stage. And an auxiliary neural critic is used to encourage the RNN LM to learn long-term dependencies by forcing it generating valid sentences. Since the vanilla GAN has limitations when generating discrete sequences, the proposed framework is optimized through the policy gradient algorithm. Experiments were conducted on two mandarin speech recognition tasks. Results show the proposed method achieved lower character error rates on both datasets compared with the maximum likelihood method, whereas it increased perplexities slightly. Finally, we visualised the sentences generated from RNN LMs. Results demonstrate the proposed method really helps the RNN LM to learn long-term dependencies and alleviates the exposure bias problem partly.
引用
收藏
页码:3348 / 3352
页数:5
相关论文
共 50 条
  • [1] IMPROVING AUTOMATIC SPEECH RECOGNITION ROBUSTNESS FOR THE ROMANIAN LANGUAGE
    Buzo, Andi
    Cucu, Horia
    Burileanu, Corneliu
    19TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO-2011), 2011, : 2119 - 2122
  • [2] A Decade of Discriminative Language Modeling for Automatic Speech Recognition
    Saraclar, Murat
    Dikici, Erinc
    Arisoy, Ebru
    SPEECH AND COMPUTER (SPECOM 2015), 2015, 9319 : 11 - 22
  • [3] An Evaluation of Structured Language Modeling for Automatic Speech Recognition
    Bjorklund, Johanna
    Cleophas, Loek
    Karlsson, My
    JOURNAL OF UNIVERSAL COMPUTER SCIENCE, 2017, 23 (11) : 1019 - 1034
  • [4] Using morphemes in language modeling and automatic speech recognition of amharic
    Tachbelie, Martha Yifiru, 1600, Cambridge University Press (20):
  • [5] Using morphemes in language modeling and automatic speech recognition of Amharic
    Tachbelie, Martha Yifiru
    Abate, Solomon Teferra
    Menzel, Wolfgang
    NATURAL LANGUAGE ENGINEERING, 2014, 20 (02) : 235 - 259
  • [6] Image-Sensitive Language Modeling for Automatic Speech Recognition
    Naszadi, Kata
    Oualil, Youssef
    Klakow, Dietrich
    COMPUTER VISION - ECCV 2018 WORKSHOPS, PT IV, 2019, 11132 : 173 - 179
  • [7] Written-Domain Language Modeling for Automatic Speech Recognition
    Sak, Hasim
    Sung, Yun-hsuan
    Beaufays, Francoise
    Allauzen, Cyril
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 675 - 679
  • [8] Improving Automatic Speech Recognition with Dialect-Specific Language Models
    Gothi, Raj
    Rao, Preeti
    SPEECH AND COMPUTER, SPECOM 2023, PT I, 2023, 14338 : 57 - 67
  • [9] Evaluation of Smoothing Techniques for Language Modeling in Automatic Filipino Speech Recognition
    Ang, Federico M.
    Ancheta, Juan Carlo Miguel C.
    Francia, Karmela Mariz F.
    Chua, Krisel G.
    TENCON 2012 - 2012 IEEE REGION 10 CONFERENCE: SUSTAINABLE DEVELOPMENT THROUGH HUMANITARIAN TECHNOLOGY, 2012,
  • [10] Improving Speech Synthesis by Automatic Speech Recognition and Speech Discriminator
    Huang, Li-Yu
    Chen, Chia-Ping
    JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2024, 40 (01) : 189 - 200