Structured Output Layer Neural Network Language Models for Speech Recognition

被引:49
|
作者
Le, Hai-Son [1 ,2 ]
Oparin, Ilya [3 ]
Allauzen, Alexandre [1 ,2 ]
Gauvain, Jean-Luc [3 ]
Yvon, Francois [1 ,2 ]
机构
[1] Univ Paris 11, F-91405 Orsay, France
[2] LIMSI CNRS, F-91405 Orsay, France
[3] LIMSI CNRS, F-91403 Orsay, France
关键词
Automatic speech recognition; neural network language model; speech-to-text;
D O I
10.1109/TASL.2012.2215599
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper extends a novel neural network language model (NNLM) which relies on word clustering to structure the output vocabulary: Structured OUtput Layer (SOUL) NNLM. This model is able to handle arbitrarily-sized vocabularies, hence dispensing with the need for shortlists that are commonly used in NNLMs. Several softmax layers replace the standard output layer in this model. The output structure depends on the word clustering which is based on the continuous word representation determined by the NNLM. Mandarin and Arabic data are used to evaluate the SOUL NNLM accuracy via speech-to-text experiments. Well tuned speech-to-text systems (with error rates around 10%) serve as the baselines. The SOUL model achieves consistent improvements over a classical shortlist NNLM both in terms of perplexity and recognition accuracy for these two languages that are quite different in terms of their internal structure and recognition vocabulary size. An enhanced training scheme is proposed that allows more data to be used at each training iteration of the neural network.
引用
收藏
页码:195 / 204
页数:10
相关论文
共 50 条
  • [21] Mixed Precision Low-Bit Quantization of Neural Network Language Models for Speech Recognition
    Xu, Junhao
    Yu, Jianwei
    Hu, Shoukang
    Liu, Xunying
    Meng, Helen
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 3679 - 3693
  • [22] Neural candidate-aware language models for speech recognition
    Tanaka, Tomohiro
    Masumura, Ryo
    Oba, Takanobu
    COMPUTER SPEECH AND LANGUAGE, 2021, 66
  • [23] DOMAIN-AWARE NEURAL LANGUAGE MODELS FOR SPEECH RECOGNITION
    Liu, Linda
    Gu, Yile
    Gourav, Aditya
    Gandhe, Ankur
    Kalmane, Shashank
    Filimonov, Denis
    Rastrow, Ariya
    Bulyko, Ivan
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7373 - 7377
  • [24] Neural Error Corrective Language Models for Automatic Speech Recognition
    Tanaka, Tomohiro
    Masumura, Ryo
    Masataki, Hirokazu
    Aono, Yushi
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 401 - 405
  • [25] INTERPRETING THE HIDDEN LAYER IN A NEURAL NETWORK FOR A SPEECH RECOGNITION TASK
    KAMM, C
    STREETER, LA
    KANEESRIG, Y
    NEURAL NETWORKS FROM MODELS TO APPLICATIONS, 1989, : 523 - 530
  • [26] Structured Discriminative Models for Speech Recognition
    Gales, Mark
    Watanabe, Shinji
    Fosler-Lussier, Eric
    IEEE SIGNAL PROCESSING MAGAZINE, 2012, 29 (06) : 70 - 81
  • [27] Structured Discriminative Models for Speech Recognition
    Gales, Mark
    2012 8TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, 2012, : XXII - XXII
  • [28] Conversion of Recurrent Neural Network Language Models to Weighted Finite State Transducers for Automatic Speech Recognition
    Lecorve, Gwenole
    Motlicek, Petr
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1666 - 1669
  • [29] Recurrent Neural Network Language Model with Part-of-speech for Mandarin Speech Recognition
    Gong, Caixia
    Li, Xiangang
    Wu, Xihong
    2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 459 - 463
  • [30] Cross-sentence Neural Language Models for Conversational Speech Recognition
    Chiu, Shih-Hsuan
    Lo, Tien-Hong
    Chen, Berlin
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,