Improving Language Modeling with an Adversarial Critic for Automatic Speech Recognition

被引：2

作者：

Zhang, Yike ^{[1
,2
]}

Zhang, Pengyuan ^{[1
,2
]}

Yan, Yonghong ^{[1
,2
,3
]}

机构：

[1] Chinese Acad Sci, Inst Acoust, Beijing 100864, Peoples R China

[2] Univ Chinese Acad Sci, Beijing, Peoples R China

[3] Chinese Acad Sci, Xinjiang Tech Inst Phys & Chem, Urumqi, Peoples R China

来源：

19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES | 2018年

基金：

中国国家自然科学基金;

关键词：

speech recognition; language modeling; generative adversarial networks; policy gradients;

D O I：

10.21437/Interspeech.2018-1111

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recurrent neural network language models (RNN LMs) trained via the maximum likelihood principle suffer from the exposure bias problem in the inference stage. Therefore, potential recognition errors limit their performance on rescoring N-best lists of the speech recognition outputs. Inspired by the generative adversarial net (GAN), this paper proposes a novel approach to alleviate this problem. We regard the RNN LM as a generative model in the training stage. And an auxiliary neural critic is used to encourage the RNN LM to learn long-term dependencies by forcing it generating valid sentences. Since the vanilla GAN has limitations when generating discrete sequences, the proposed framework is optimized through the policy gradient algorithm. Experiments were conducted on two mandarin speech recognition tasks. Results show the proposed method achieved lower character error rates on both datasets compared with the maximum likelihood method, whereas it increased perplexities slightly. Finally, we visualised the sentences generated from RNN LMs. Results demonstrate the proposed method really helps the RNN LM to learn long-term dependencies and alleviates the exposure bias problem partly.

引用

页码：3348 / 3352

页数：5

共 50 条

[31] Automatic emotional speech recognition in Serbian language
Bojanic, Milana
Delic, Vlado
2013 21ST TELECOMMUNICATIONS FORUM (TELFOR), 2013, : 459 - 465
[32] RELEVANCE LANGUAGE MODELING FOR SPEECH RECOGNITION
Chen, Kuan-Yu
Chen, Berlin
2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5568 - 5571
[33] Imperceptible, Robust, and Targeted Adversarial Examples for Automatic Speech Recognition
Qin, Yao
Carlini, Nicholas
Goodfellow, Ian
Cottrell, Garrison
Raffel, Colin
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
[34] Effective Adversarial Sample Detection for Securing Automatic Speech Recognition
Lin, Chih-Yang
Wang, Yan-Zhang
Lin, Shou-Kuan
Farady, Isack
Jan, Yih-Kuen
Lin, Wei-Yang
2024 IEEE INTERNATIONAL CONFERENCE ON ADVANCED VIDEO AND SIGNAL BASED SURVEILLANCE, AVSS 2024, 2024,
[35] Improving language models for radiology speech recognition
Paulett, John M.
Langlotz, Curtis P.
JOURNAL OF BIOMEDICAL INFORMATICS, 2009, 42 (01) : 53 - 58
[36] Improving Speech Emotion Recognition With Adversarial Data Augmentation Network
Yi, Lu
Mak, Man-Wai
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (01) : 172 - 184
[37] Improving speech recognition systems for the morphologically complex Malayalam language using subword tokens for language modeling
Manohar, Kavya
Jayan, A. R.
Rajan, Rajeev
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2023, 2023 (01)
[38] Improving speech recognition systems for the morphologically complex Malayalam language using subword tokens for language modeling
Kavya Manohar
Jayan A R
Rajeev Rajan
EURASIP Journal on Audio, Speech, and Music Processing, 2023
[39] Automatic Speech Recognition System Channel Modeling
Tan, Qun Feng
Audhkhasi, Kartik
Georgiou, Panayiotis G.
Ettelaie, Emil
Narayanan, Shrikanth
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2442 - 2445
[40] Improving N-gram Language Modeling for Code-switching Speech Recognition
Zeng, Zhiping
Xu, Haihua
Chong, Tze Yuang
Chng, Eng-Siong
Li, Haizhou
2017 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC 2017), 2017, : 1546 - 1551

← 1 2 3 4 5 →