Structured Output Layer Neural Network Language Models for Speech Recognition

被引:49
|
作者
Le, Hai-Son [1 ,2 ]
Oparin, Ilya [3 ]
Allauzen, Alexandre [1 ,2 ]
Gauvain, Jean-Luc [3 ]
Yvon, Francois [1 ,2 ]
机构
[1] Univ Paris 11, F-91405 Orsay, France
[2] LIMSI CNRS, F-91405 Orsay, France
[3] LIMSI CNRS, F-91403 Orsay, France
关键词
Automatic speech recognition; neural network language model; speech-to-text;
D O I
10.1109/TASL.2012.2215599
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper extends a novel neural network language model (NNLM) which relies on word clustering to structure the output vocabulary: Structured OUtput Layer (SOUL) NNLM. This model is able to handle arbitrarily-sized vocabularies, hence dispensing with the need for shortlists that are commonly used in NNLMs. Several softmax layers replace the standard output layer in this model. The output structure depends on the word clustering which is based on the continuous word representation determined by the NNLM. Mandarin and Arabic data are used to evaluate the SOUL NNLM accuracy via speech-to-text experiments. Well tuned speech-to-text systems (with error rates around 10%) serve as the baselines. The SOUL model achieves consistent improvements over a classical shortlist NNLM both in terms of perplexity and recognition accuracy for these two languages that are quite different in terms of their internal structure and recognition vocabulary size. An enhanced training scheme is proposed that allows more data to be used at each training iteration of the neural network.
引用
收藏
页码:195 / 204
页数:10
相关论文
共 50 条
  • [41] Significance of distributed representation in the output layer of a neural network in a pattern recognition task
    Takeda, T.
    Kishi, K.
    Yamanouchi, T.
    Mizoe, H.
    Matsuoka, T.
    Medical and Biological Engineering and Computing, 1994, 32 (01): : 77 - 84
  • [42] Improving handwritten Chinese text recognition using neural network language models and convolutional neural network shape models
    Wu, Yi-Chao
    Yin, Fei
    Liu, Cheng-Lin
    PATTERN RECOGNITION, 2017, 65 : 251 - 264
  • [43] Neural network approach to speech recognition
    Lee, Y.C.
    Chen, H.H.
    Sun, G.Z.
    Neural Networks, 1988, 1 (1 SUPPL)
  • [44] Quantum neural network in speech recognition
    Fei, L
    Zhao, SM
    Zheng, BY
    2002 6TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS, VOLS I AND II, 2002, : 1267 - 1270
  • [45] Bag-of-Words Input for Long History Representation in Neural Network-based Language Models for Speech Recognition
    Irie, Kazuki
    Schlueter, Ralf
    Ney, Hermann
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2371 - 2375
  • [46] Evaluation of Neural Network Language Models In Handwritten Chinese Text Recognition
    Wu, Yi-Chao
    Yin, Fei
    Liu, Cheng-Lin
    2015 13TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2015, : 166 - 170
  • [47] Neural network language models for off-line handwriting recognition
    Zamora-Martinez, F.
    Frinken, V.
    Espana-Boquera, S.
    Castro-Bleda, M. J.
    Fischer, A.
    Bunke, H.
    PATTERN RECOGNITION, 2014, 47 (04) : 1642 - 1652
  • [48] 3-LAYER NEURAL NETWORK MODELS FOR ROTATED PATTERNS RECOGNITION
    SUZAKI, K
    ARAYA, S
    NAKAMURA, R
    IFIP TRANSACTIONS A-COMPUTER SCIENCE AND TECHNOLOGY, 1992, 12 : 667 - 673
  • [49] N-gram Language Models in JLASER Neural Network Speech Recognizer
    Konopik, Miloslav
    Habernal, Ivan
    Brychcin, Tomas
    2010 INTERNATIONAL CONFERENCE ON APPLIED ELECTRONICS, 2010, : 167 - 170
  • [50] End-To-End deep neural models for Automatic Speech Recognition for Polish Language
    Pondel-Sycz, Karolina
    Pietrzak, Agnieszka Paula
    Szymla, Julia
    INTERNATIONAL JOURNAL OF ELECTRONICS AND TELECOMMUNICATIONS, 2024, 70 (02) : 315 - 321