COPING WITH OUT-OF-VOCABULARY WORDS: OPEN VERSUS HUGE VOCABULARY ASR

被引:3
|
作者
Gerosa, Matteo [1 ]
Federico, Marcello [1 ]
机构
[1] FBK Irst Fdn Bruno Kessler, I-38100 Povo, TN, Italy
关键词
Automatic Speech Recognition; Open-vocabulary speech recognition; OOV words;
D O I
10.1109/ICASSP.2009.4960583
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper investigates methods for coping with out-of-vocabulary words in a large vocabulary speech recognition task, namely the automatic transcription of Italian broadcast news. Two alternative ways for augmenting a 64K(thousand)-word recognition vocabulary and language model are compared: introducing extra words with their phonetic transcription up to 1.2M (million) words, or extending the language model with so-called graphones, i.e. sub-word units made of phone-character sequences. Graphones and phonetic transcriptions of words are automatically generated by adapting an off-the-shelf statistical machine translation toolkit. We found that the word-based and graphone-based extensions allow both for better recognition performance, with the former performing significantly better than the latter. In addition, the word-based extension approach shows interesting potential even under conditions of little supervision. In fact, by training the grapheme to phoneme translation system with only 2K manually verified transcriptions, the final word error rate increases by just 3% relative, with respect to starting from a lexicon of 64K Words.
引用
收藏
页码:4313 / 4316
页数:4
相关论文
共 50 条
  • [21] Online PLSA: Batch Updating Techniques Including Out-of-Vocabulary Words
    Bassiou, Nikoletta K.
    Kotropoulos, Constantine L.
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2014, 25 (11) : 1953 - 1966
  • [22] Querying out-of-vocabulary words in lexicon-based keyword spotting
    Joan Puigcerver
    Alejandro H. Toselli
    Enrique Vidal
    Neural Computing and Applications, 2017, 28 : 2373 - 2382
  • [23] FastContext: Handling Out-of-Vocabulary Words Using the Word Structure and Context
    Silva, Renato M.
    Lochter, Johannes, V
    Almeida, Tiago A.
    Yamakami, Akebo
    INTELLIGENT SYSTEMS, PT II, 2022, 13654 : 539 - 557
  • [24] Querying out-of-vocabulary words in lexicon-based keyword spotting
    Puigcerver, Joan
    Toselli, Alejandro H.
    Vidal, Enrique
    NEURAL COMPUTING & APPLICATIONS, 2017, 28 (09): : 2373 - 2382
  • [25] Out-of-Vocabulary Word Detection and Beyond
    Kombrink, Stefan
    Hannemann, Mirko
    Burget, Lukas
    DETECTION AND IDENTIFICATION OF RARE AUDIOVISUAL CUES, 2012, 384 : 57 - 65
  • [26] Improving out-of-vocabulary name resolution
    Palmer, DD
    Ostendorf, M
    COMPUTER SPEECH AND LANGUAGE, 2005, 19 (01): : 107 - 128
  • [27] Out-Of-Vocabulary Words Recognition Based on Conditional Random Field in Electronic Commerce
    Yang, Yanfeng
    Yang, Yanqin
    Guan, Hu
    Xu, Wenchao
    NEURAL INFORMATION PROCESSING (ICONIP 2014), PT II, 2014, 8835 : 532 - 539
  • [28] Chinese Word Segmentation and Out-Of-Vocabulary Words Detection Using Suffix Array
    Ji Wenyan
    Peng Tao
    Zuo Wanli
    He Fengling
    Zhu Huifeng
    WISM: 2009 INTERNATIONAL CONFERENCE ON WEB INFORMATION SYSTEMS AND MINING, PROCEEDINGS, 2009, : 56 - 60
  • [29] SPEECH RECOGNITION OF FOREIGN OUT-OF-VOCABULARY WORDS USING A HIERARCHICAL LANGUAGE MODEL
    Yamamoto, Hirofumi
    Kikui, Genichiro
    Nakamura, Satoshi
    Sagisaka, Yoshinori
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1870 - +
  • [30] Exploring Edit Distance for Normalising Out-of-Vocabulary Malay Words on Social Media
    Athirah, Raja Roza
    Soon, Lay-Ki
    Haw, Su-Cheng
    ENGINEERING APPLICATION OF ARTIFICIAL INTELLIGENCE CONFERENCE 2018 (EAAIC 2018), 2019, 255