Syllable language models for Mandarin speech recognition: Exploiting character language models

被引:18
|
作者
Liu, Xunying [1 ]
Hieronymus, James L. [2 ]
Gales, Mark J. F. [1 ]
Woodland, Philip C. [1 ]
机构
[1] Univ Cambridge, Dept Engn, Cambridge CB2 1PZ, England
[2] Int Comp Sci Inst, Berkeley, CA 94704 USA
来源
关键词
CHINESE-LANGUAGE; ADAPTATION; ALGORITHM;
D O I
10.1121/1.4768800
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Mandarin Chinese is based on characters which are syllabic in nature and morphological in meaning. All spoken languages have syllabiotactic rules which govern the construction of syllables and their allowed sequences. These constraints are not as restrictive as those learned from word sequences, but they can provide additional useful linguistic information. Hence, it is possible to improve speech recognition performance by appropriately combining these two types of constraints. For the Chinese language considered in this paper, character level language models (LMs) can be used as a first level approximation to allowed syllable sequences. To test this idea, word and character level n-gram LMs were trained on 2.8 billion words (equivalent to 4.3 billion characters) of texts from a wide collection of text sources. Both hypothesis and model based combination techniques were investigated to combine word and character level LMs. Significant character error rate reductions up to 7.3% relative were obtained on a state-of-the-art Mandarin Chinese broadcast audio recognition task using an adapted history dependent multi-level LM that performs a log-linearly combination of character and word level LMs. This supports the hypothesis that character or syllable sequence models are useful for improving Mandarin speech recognition performance. (C) 2013 Acoustical Society of America. [http://dx.doi.org/10.1121/1.4768800]
引用
收藏
页码:519 / 528
页数:10
相关论文
共 50 条
  • [41] DOMAIN-AWARE NEURAL LANGUAGE MODELS FOR SPEECH RECOGNITION
    Liu, Linda
    Gu, Yile
    Gourav, Aditya
    Gandhe, Ankur
    Kalmane, Shashank
    Filimonov, Denis
    Rastrow, Ariya
    Bulyko, Ivan
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7373 - 7377
  • [42] Combining stochastic and linguistic language models for recognition of spontaneous speech
    Eckert, W
    Gallwitz, F
    Niemann, H
    1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 423 - 426
  • [43] Comparison of Various Neural Network Language Models in Speech Recognition
    Zuo, Lingyun
    Liu, Jian
    Wan, Xin
    2016 3RD INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND CONTROL ENGINEERING (ICISCE), 2016, : 894 - 898
  • [44] k-TSS language models in speech recognition systems
    Torres, I
    Varona, A
    COMPUTER SPEECH AND LANGUAGE, 2001, 15 (02): : 127 - 149
  • [45] Large vocabulary speech recognition with multispan statistical language models
    Bellegarda, JR
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2000, 8 (01): : 76 - 84
  • [46] Neural Error Corrective Language Models for Automatic Speech Recognition
    Tanaka, Tomohiro
    Masumura, Ryo
    Masataki, Hirokazu
    Aono, Yushi
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 401 - 405
  • [47] Transducer-based Speech Recognition with Dynamic Language Models
    Georges, Munir
    Kanthak, Stephan
    Klakow, Dietrich
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 642 - 646
  • [48] Converting Neural Network Language Models into Back-off Language Models for Efficient Decoding in Automatic Speech Recognition
    Arisoy, Ebru
    Chen, Stanley F.
    Ramabhadran, Bhuvana
    Sethy, Abhinav
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (01) : 184 - 192
  • [49] CONVERTING NEURAL NETWORK LANGUAGE MODELS INTO BACK-OFF LANGUAGE MODELS FOR EFFICIENT DECODING IN AUTOMATIC SPEECH RECOGNITION
    Arisoy, Ebru
    Chen, Stanley F.
    Ramabhadran, Bhuvana
    Sethy, Abhinav
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 8242 - 8246
  • [50] Interpolation of n-gram and mutual-information based trigger pair language models for Mandarin speech recognition
    Department of Computer Science, School of Computing, National University of Singapore, Lower Kent Ridge Road, Singapore 119260, Singapore
    Comput Speech Lang, 2 (125-141):