Improving Language Modeling with an Adversarial Critic for Automatic Speech Recognition

被引:2
|
作者
Zhang, Yike [1 ,2 ]
Zhang, Pengyuan [1 ,2 ]
Yan, Yonghong [1 ,2 ,3 ]
机构
[1] Chinese Acad Sci, Inst Acoust, Beijing 100864, Peoples R China
[2] Univ Chinese Acad Sci, Beijing, Peoples R China
[3] Chinese Acad Sci, Xinjiang Tech Inst Phys & Chem, Urumqi, Peoples R China
基金
中国国家自然科学基金;
关键词
speech recognition; language modeling; generative adversarial networks; policy gradients;
D O I
10.21437/Interspeech.2018-1111
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recurrent neural network language models (RNN LMs) trained via the maximum likelihood principle suffer from the exposure bias problem in the inference stage. Therefore, potential recognition errors limit their performance on rescoring N-best lists of the speech recognition outputs. Inspired by the generative adversarial net (GAN), this paper proposes a novel approach to alleviate this problem. We regard the RNN LM as a generative model in the training stage. And an auxiliary neural critic is used to encourage the RNN LM to learn long-term dependencies by forcing it generating valid sentences. Since the vanilla GAN has limitations when generating discrete sequences, the proposed framework is optimized through the policy gradient algorithm. Experiments were conducted on two mandarin speech recognition tasks. Results show the proposed method achieved lower character error rates on both datasets compared with the maximum likelihood method, whereas it increased perplexities slightly. Finally, we visualised the sentences generated from RNN LMs. Results demonstrate the proposed method really helps the RNN LM to learn long-term dependencies and alleviates the exposure bias problem partly.
引用
收藏
页码:3348 / 3352
页数:5
相关论文
共 50 条
  • [41] Prosody modeling for automatic speech recognition and understanding
    Shriberg, E
    Stolcke, A
    MATHEMATICAL FOUNDATIONS OF SPEECH AND LANGUAGE PROCESSING, 2004, 138 : 105 - 114
  • [42] FEDERATED ACOUSTIC MODELING FOR AUTOMATIC SPEECH RECOGNITION
    Cui, Xiaodong
    Lu, Songtao
    Kingsbury, Brian
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6748 - 6752
  • [43] Duration Modeling in Automatic Recited Speech Recognition
    Alotaibi, Yousef A.
    Yakoub, Mohammed Sidi
    Meftah, Ali
    Selouani, Sid-Ahmed
    2016 39TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS AND SIGNAL PROCESSING (TSP), 2016, : 323 - 326
  • [44] Improving Neural Language Modeling via Adversarial Training
    Wang, Dilin
    Gong, Chengyue
    Liu, Qiang
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [45] Improving Hypernasality Estimation with Automatic Speech Recognition in Cleft Palate Speech
    Song, Kaitao
    Wan, Teng
    Wang, Bixia
    Jiang, Huiqiang
    Qiu, Luna
    Xu, Jiahang
    Jiang, Liping
    Lou, Qun
    Yang, Yuqing
    Li, Dongsheng
    Wang, Xudong
    Qiu, Lili
    INTERSPEECH 2022, 2022, : 4820 - 4824
  • [46] Improving Automatic Emotion Recognition from Speech Signals
    Bozkurt, Elif
    Erzin, Engin
    Erdem, Cigdem Eroglu
    Erdem, A. Tanju
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 312 - +
  • [47] Investigating Stranded GMM for Improving Automatic Speech Recognition
    Gorin, Arseniy
    Jouvet, Denis
    Vincent, Emmanuel
    Tran, Dung
    2014 4TH JOINT WORKSHOP ON HANDS-FREE SPEECH COMMUNICATION AND MICROPHONE ARRAYS (HSCMA), 2014, : 192 - 196
  • [48] IMPROVING ROBUSTNESS AGAINST REVERBERATION FOR AUTOMATIC SPEECH RECOGNITION
    Mitra, Vikramjit
    Van Hout, Julien
    Wang, Wen
    Graciarena, Martin
    McLaren, Mitchell
    Franco, Horacio
    Vergyri, Dimitra
    2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 525 - 532
  • [49] Applications of automatic speech recognition to speech and language development in young children
    Russell, M
    Brown, C
    Skilling, A
    Series, R
    Wallace, J
    Bonham, B
    Barker, P
    ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 176 - 179
  • [50] Automatic speech recognition: A primer for speech-language pathology researchers
    Keshet, Joseph
    INTERNATIONAL JOURNAL OF SPEECH-LANGUAGE PATHOLOGY, 2018, 20 (06) : 599 - 609