Structured Output Layer Neural Network Language Models for Speech Recognition

被引：49

作者：

Le, Hai-Son ^{[1
,2
]}

Oparin, Ilya ^{[3
]}

Allauzen, Alexandre ^{[1
,2
]}

Gauvain, Jean-Luc ^{[3
]}

Yvon, Francois ^{[1
,2
]}

机构：

[1] Univ Paris 11, F-91405 Orsay, France

[2] LIMSI CNRS, F-91405 Orsay, France

[3] LIMSI CNRS, F-91403 Orsay, France

来源：

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2013年 / 21卷 / 01期

关键词：

Automatic speech recognition; neural network language model; speech-to-text;

D O I：

10.1109/TASL.2012.2215599

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper extends a novel neural network language model (NNLM) which relies on word clustering to structure the output vocabulary: Structured OUtput Layer (SOUL) NNLM. This model is able to handle arbitrarily-sized vocabularies, hence dispensing with the need for shortlists that are commonly used in NNLMs. Several softmax layers replace the standard output layer in this model. The output structure depends on the word clustering which is based on the continuous word representation determined by the NNLM. Mandarin and Arabic data are used to evaluate the SOUL NNLM accuracy via speech-to-text experiments. Well tuned speech-to-text systems (with error rates around 10%) serve as the baselines. The SOUL model achieves consistent improvements over a classical shortlist NNLM both in terms of perplexity and recognition accuracy for these two languages that are quite different in terms of their internal structure and recognition vocabulary size. An enhanced training scheme is proposed that allows more data to be used at each training iteration of the neural network.

引用

页码：195 / 204

页数：10

共 50 条

[1] STRUCTURED OUTPUT LAYER NEURAL NETWORK LANGUAGE MODEL
Le, Hai-Son
Oparin, Ilya
Allauzen, Alexandre
Gauvain, Jean-Luc
Yvon, Francois
2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5524 - 5527
[2] Comparison of Various Neural Network Language Models in Speech Recognition
Zuo, Lingyun
Liu, Jian
Wan, Xin
2016 3RD INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND CONTROL ENGINEERING (ICISCE), 2016, : 894 - 898
[3] RECURRENT NEURAL NETWORK LANGUAGE MODEL WITH STRUCTURED WORD EMBEDDINGS FOR SPEECH RECOGNITION
He, Tianxing
Xiang, Xu
Qian, Yanmin
Yu, Kai
2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 5396 - 5400
[4] Investigating Bidirectional Recurrent Neural Network Language Models for Speech Recognition
Chen, X.
Ragni, A.
Liu, X.
Gales, M. J. F.
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 269 - 273
[5] BIDIRECTIONAL RECURRENT NEURAL NETWORK LANGUAGE MODELS FOR AUTOMATIC SPEECH RECOGNITION
Arisoy, Ebru
Sethy, Abhinav
Ramabhadran, Bhuvana
Chen, Stanley
2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 5421 - 5425
[6] Empirical study of neural network language models for Arabic speech recognition
Emami, Ahmad
Mangu, Lidia
2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2, 2007, : 147 - 152
[7] Exploiting Future Word Contexts in Neural Network Language Models for Speech Recognition
Chen, Xie
Liu, Xunying
Wang, Yu
Ragni, Anton
Wong, Jeremy H. M.
Gales, Mark J. F.
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (09) : 1444 - 1454
[8] SEMANTIC WORD EMBEDDING NEURAL NETWORK LANGUAGE MODELS FOR AUTOMATIC SPEECH RECOGNITION
Audhkhasi, Kartik
Sethy, Abhinav
Ramabhadran, Bhuvana
2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5995 - 5999
[9] Latent Words Recurrent Neural Network Language Models for Automatic Speech Recognition
Masumura, Ryo
Asami, Taichi
Oba, Takanobu
Sakauchi, Sumitaka
Ito, Akinori
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2019, E102D (12) : 2557 - 2567
[10] GAUSSIAN PROCESS LSTM RECURRENT NEURAL NETWORK LANGUAGE MODELS FOR SPEECH RECOGNITION
Lam, Max W. Y.
Chen, Xie
Hu, Shoukang
Yu, Jianwei
Liu, Xunying
Meng, Helen
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7235 - 7239

← 1 2 3 4 5 →