COPING WITH OUT-OF-VOCABULARY WORDS: OPEN VERSUS HUGE VOCABULARY ASR

被引:3
|
作者
Gerosa, Matteo [1 ]
Federico, Marcello [1 ]
机构
[1] FBK Irst Fdn Bruno Kessler, I-38100 Povo, TN, Italy
关键词
Automatic Speech Recognition; Open-vocabulary speech recognition; OOV words;
D O I
10.1109/ICASSP.2009.4960583
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper investigates methods for coping with out-of-vocabulary words in a large vocabulary speech recognition task, namely the automatic transcription of Italian broadcast news. Two alternative ways for augmenting a 64K(thousand)-word recognition vocabulary and language model are compared: introducing extra words with their phonetic transcription up to 1.2M (million) words, or extending the language model with so-called graphones, i.e. sub-word units made of phone-character sequences. Graphones and phonetic transcriptions of words are automatically generated by adapting an off-the-shelf statistical machine translation toolkit. We found that the word-based and graphone-based extensions allow both for better recognition performance, with the former performing significantly better than the latter. In addition, the word-based extension approach shows interesting potential even under conditions of little supervision. In fact, by training the grapheme to phoneme translation system with only 2K manually verified transcriptions, the final word error rate increases by just 3% relative, with respect to starting from a lexicon of 64K Words.
引用
收藏
页码:4313 / 4316
页数:4
相关论文
共 50 条
  • [41] Single-class Support Vector Machine for an Out-of-Vocabulary Rejection of Isolated Words
    He, Dongzhi
    Hou, Yibin
    Huang, Zhangqin
    Ding, Zhihao
    2009 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND BIOMIMETICS (ROBIO 2009), VOLS 1-4, 2009, : 1376 - 1380
  • [42] Replacing Out-of-Vocabulary Words with an Appropriate Synonym Based on Word2VnCR
    Kim, Jeongin
    Hong, Taekeun
    Kim, Pankoo
    Mobile Information Systems, 2021, 2021
  • [43] Class-Based N-Gram Language Model for New Words Using Out-of-Vocabulary to In-Vocabulary Similarity
    Naptali, Welly
    Tsuchiya, Masatoshi
    Nakagawa, Seiichi
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2012, E95D (09) : 2308 - 2317
  • [44] Handling Out-Of-Vocabulary Problem in Hangeul Word Embeddings
    Kwon, Ohjoon
    Kim, Dohyun
    Lee, Soo-Ryeon
    Choi, Junyoung
    Lee, SangKeun
    16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 3213 - 3221
  • [45] An improved two-stage mixed language model approach for handling out-of-vocabulary words in large vocabulary continuous speech recognition
    Reveil, Bert
    Demuynck, Kris
    Martens, Jean-Pierre
    COMPUTER SPEECH AND LANGUAGE, 2014, 28 (01): : 141 - 162
  • [46] PatchBERT: Just-in-Time, Out-of-Vocabulary Patching
    Moon, Sangwhan
    Okazaki, Naoaki
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 7846 - 7852
  • [47] Out-of-vocabulary rejection based on selective attention model
    Park, KY
    Lee, SY
    NEURAL PROCESSING LETTERS, 2000, 12 (01) : 41 - 48
  • [48] Out-of-Vocabulary Rejection based on Selective Attention Model
    Ki-Young Park
    Soo-Young Lee
    Neural Processing Letters, 2000, 12 : 41 - 48
  • [49] Subword RNNLM Approximations for Out-Of-Vocabulary Keyword Search
    Singh, Mittul
    Virpioja, Sami
    Smit, Peter
    Kurimo, Mikko
    INTERSPEECH 2019, 2019, : 4235 - 4239
  • [50] Triplet Confidence for Robust Out-of-vocabulary Keyword Spotting
    Wang, Chengliang
    Hao, Yujie
    Wu, Xing
    Liao, Chao
    2022 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS 22), 2022, : 3130 - 3134