Improving Language Modeling with an Adversarial Critic for Automatic Speech Recognition

被引:2
|
作者
Zhang, Yike [1 ,2 ]
Zhang, Pengyuan [1 ,2 ]
Yan, Yonghong [1 ,2 ,3 ]
机构
[1] Chinese Acad Sci, Inst Acoust, Beijing 100864, Peoples R China
[2] Univ Chinese Acad Sci, Beijing, Peoples R China
[3] Chinese Acad Sci, Xinjiang Tech Inst Phys & Chem, Urumqi, Peoples R China
基金
中国国家自然科学基金;
关键词
speech recognition; language modeling; generative adversarial networks; policy gradients;
D O I
10.21437/Interspeech.2018-1111
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recurrent neural network language models (RNN LMs) trained via the maximum likelihood principle suffer from the exposure bias problem in the inference stage. Therefore, potential recognition errors limit their performance on rescoring N-best lists of the speech recognition outputs. Inspired by the generative adversarial net (GAN), this paper proposes a novel approach to alleviate this problem. We regard the RNN LM as a generative model in the training stage. And an auxiliary neural critic is used to encourage the RNN LM to learn long-term dependencies by forcing it generating valid sentences. Since the vanilla GAN has limitations when generating discrete sequences, the proposed framework is optimized through the policy gradient algorithm. Experiments were conducted on two mandarin speech recognition tasks. Results show the proposed method achieved lower character error rates on both datasets compared with the maximum likelihood method, whereas it increased perplexities slightly. Finally, we visualised the sentences generated from RNN LMs. Results demonstrate the proposed method really helps the RNN LM to learn long-term dependencies and alleviates the exposure bias problem partly.
引用
收藏
页码:3348 / 3352
页数:5
相关论文
共 50 条
  • [31] Automatic emotional speech recognition in Serbian language
    Bojanic, Milana
    Delic, Vlado
    2013 21ST TELECOMMUNICATIONS FORUM (TELFOR), 2013, : 459 - 465
  • [32] RELEVANCE LANGUAGE MODELING FOR SPEECH RECOGNITION
    Chen, Kuan-Yu
    Chen, Berlin
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5568 - 5571
  • [33] Imperceptible, Robust, and Targeted Adversarial Examples for Automatic Speech Recognition
    Qin, Yao
    Carlini, Nicholas
    Goodfellow, Ian
    Cottrell, Garrison
    Raffel, Colin
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [34] Effective Adversarial Sample Detection for Securing Automatic Speech Recognition
    Lin, Chih-Yang
    Wang, Yan-Zhang
    Lin, Shou-Kuan
    Farady, Isack
    Jan, Yih-Kuen
    Lin, Wei-Yang
    2024 IEEE INTERNATIONAL CONFERENCE ON ADVANCED VIDEO AND SIGNAL BASED SURVEILLANCE, AVSS 2024, 2024,
  • [35] Improving language models for radiology speech recognition
    Paulett, John M.
    Langlotz, Curtis P.
    JOURNAL OF BIOMEDICAL INFORMATICS, 2009, 42 (01) : 53 - 58
  • [36] Improving Speech Emotion Recognition With Adversarial Data Augmentation Network
    Yi, Lu
    Mak, Man-Wai
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (01) : 172 - 184
  • [37] Improving speech recognition systems for the morphologically complex Malayalam language using subword tokens for language modeling
    Manohar, Kavya
    Jayan, A. R.
    Rajan, Rajeev
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2023, 2023 (01)
  • [38] Improving speech recognition systems for the morphologically complex Malayalam language using subword tokens for language modeling
    Kavya Manohar
    Jayan A R
    Rajeev Rajan
    EURASIP Journal on Audio, Speech, and Music Processing, 2023
  • [39] Automatic Speech Recognition System Channel Modeling
    Tan, Qun Feng
    Audhkhasi, Kartik
    Georgiou, Panayiotis G.
    Ettelaie, Emil
    Narayanan, Shrikanth
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2442 - 2445
  • [40] Improving N-gram Language Modeling for Code-switching Speech Recognition
    Zeng, Zhiping
    Xu, Haihua
    Chong, Tze Yuang
    Chng, Eng-Siong
    Li, Haizhou
    2017 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC 2017), 2017, : 1546 - 1551