Syllable Based Language Model for Large Vocabulary Continuous Speech Recognition of Polish

被引:0
|
作者
Majewski, Piotr [1 ]
机构
[1] Univ Lodz, Fac Math & Comp Sci, PL-90238 Lodz, Poland
来源
关键词
Polish; large vocabulary continuous speech recognition; language modeling; sub-word units; syllable-based units;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Most of state-of-the-art large vocabulary continuous speech recognition systems use word-based n-gram language models. Such models are not optimal solution for inflectional or agglutinative languages. The Polish language is highly inflectional one and requires a very large corpora to create a sufficient language model with the small out-of-vocabulary ratio. We propose a syllable-based language model. which is better suited to highly inflectional language like Polish. In case of lack of resources (i.e. small corpora) syllable-based model outperforms word-based models in terms of number of out-of-vocabulary units (syllables in our model). Such model is an approximation of the morphene-based model for Polish. In our paper, we show results of evaluation of syllable based model and its usefulness in speech recognition tasks.
引用
收藏
页码:397 / 401
页数:5
相关论文
共 50 条
  • [1] Syllable based language model for large vocabulary continuous speech recognition of Uyghur
    Silamu, W. (wushour@xju.edu.cn), 1600, Tsinghua University (53):
  • [2] Syllable-based large vocabulary continuous speech recognition
    Ganapathiraju, A
    Hamaker, J
    Picone, J
    Ordowski, M
    Doddington, GR
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2001, 9 (04): : 358 - 366
  • [3] Development of Large Vocabulary Continuous Speech Recognition for Polish
    Demenko, G.
    Szymanski, M.
    Cecko, R.
    Kusmierek, E.
    Lange, M.
    Wegner, K.
    Klessa, K.
    Owsianny, M.
    ACTA PHYSICA POLONICA A, 2012, 121 (1A) : A86 - A91
  • [4] A unified language model for large vocabulary continuous speech recognition of Turkish
    Arisoy, Ebru
    Dutagaci, Helin
    Arslan, Levent M.
    SIGNAL PROCESSING, 2006, 86 (10) : 2844 - 2862
  • [5] Continuous Mandarin speech recognition for Chinese language with large vocabulary based on segmental probability model
    Shen, JL
    IEE PROCEEDINGS-VISION IMAGE AND SIGNAL PROCESSING, 1998, 145 (05): : 309 - 315
  • [6] A large vocabulary continuous speech recognition system for Persian language
    Sameti, Hossein
    Veisi, Hadi
    Bahrani, Mohammad
    Babaali, Bagher
    Hosseinzadeh, Khosro
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2011, : 1 - 12
  • [7] A large vocabulary continuous speech recognition system for Persian language
    Hossein Sameti
    Hadi Veisi
    Mohammad Bahrani
    Bagher Babaali
    Khosro Hosseinzadeh
    EURASIP Journal on Audio, Speech, and Music Processing, 2011
  • [8] Connectionist language modeling for large vocabulary continuous speech recognition
    Schwenk, H
    Gauvain, JL
    2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 765 - 768
  • [9] A usage of the syllable unit based on morphological statistics in Korean large vocabulary continuous speech recognition system
    Hyok-Chol Ri
    International Journal of Speech Technology, 2019, 22 : 971 - 977
  • [10] A usage of the syllable unit based on morphological statistics in Korean large vocabulary continuous speech recognition system
    Ri, Hyok-Chol
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2019, 22 (04) : 971 - 977