COPING WITH OUT-OF-VOCABULARY WORDS: OPEN VERSUS HUGE VOCABULARY ASR

被引:3
|
作者
Gerosa, Matteo [1 ]
Federico, Marcello [1 ]
机构
[1] FBK Irst Fdn Bruno Kessler, I-38100 Povo, TN, Italy
关键词
Automatic Speech Recognition; Open-vocabulary speech recognition; OOV words;
D O I
10.1109/ICASSP.2009.4960583
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper investigates methods for coping with out-of-vocabulary words in a large vocabulary speech recognition task, namely the automatic transcription of Italian broadcast news. Two alternative ways for augmenting a 64K(thousand)-word recognition vocabulary and language model are compared: introducing extra words with their phonetic transcription up to 1.2M (million) words, or extending the language model with so-called graphones, i.e. sub-word units made of phone-character sequences. Graphones and phonetic transcriptions of words are automatically generated by adapting an off-the-shelf statistical machine translation toolkit. We found that the word-based and graphone-based extensions allow both for better recognition performance, with the former performing significantly better than the latter. In addition, the word-based extension approach shows interesting potential even under conditions of little supervision. In fact, by training the grapheme to phoneme translation system with only 2K manually verified transcriptions, the final word error rate increases by just 3% relative, with respect to starting from a lexicon of 64K Words.
引用
收藏
页码:4313 / 4316
页数:4
相关论文
共 50 条
  • [31] A Spoken Term Detection Framework for Recovering Out-of-Vocabulary Words Using the Web
    Parada, Carolina
    Sethy, Abhinav
    Dredze, Mark
    Jelinek, Frederick
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 1269 - +
  • [32] Improved Neural Bag-of-Words Model to Retrieve Out-of-Vocabulary Words in Speech Recognition
    Sheikh, Imran
    Illina, Irina
    Fohr, Dominique
    Linares, Georges
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 675 - 679
  • [33] Enhancing Out-of-Vocabulary Estimation with Subword Attention
    Patel, Raj
    Domeniconi, Carlotta
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 3592 - 3601
  • [34] Out-of-vocabulary word rejection algorithm in Korean variable vocabulary word recognition
    Moon, KS
    Kim, YJ
    Kim, HR
    Chung, JH
    ISCAS 2000: IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS - PROCEEDINGS, VOL V: EMERGING TECHNOLOGIES FOR THE 21ST CENTURY, 2000, : 53 - 56
  • [35] English Out-of-Vocabulary Lexical Evaluation Task
    Wang, Han
    Wang, Ye
    Zhang, Xinxiang
    Lu, Mi
    Choe, Yoonsuck
    Cao, Jingjing
    2019 IEEE 17TH INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS (INDIN), 2019, : 1468 - 1472
  • [36] Improving Recognition of Out-of-vocabulary Words in E2E Code-switching ASR by Fusing Speech Generation Methods
    Ye, Lingxuan
    Cheng, Gaofeng
    Yang, Runyan
    Yang, Zehui
    Tian, Sanli
    Zhang, Pengyuan
    Yan, Yonghong
    INTERSPEECH 2022, 2022, : 3163 - 3167
  • [37] Predicting the out-of-vocabulary rate and the required vocabulary size for speech processing applications
    Muller, J
    Stahl, H
    Lang, M
    ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1922 - 1925
  • [38] SYSTEM COMBINATION FOR OUT-OF-VOCABULARY WORD DETECTION
    Qin, Long
    Sun, Ming
    Rudnicky, Alexander
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4817 - 4820
  • [39] Incorporate web search technology to solve out-of-vocabulary words in Chinese word segmentation
    Qiao, Wei
    Sun, Maosong
    PACLIC 23 - Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation, 2009, 2 : 454 - 463
  • [40] Replacing Out-of-Vocabulary Words with an Appropriate Synonym Based on Word2VnCR
    Kim, Jeongin
    Hong, Taekeun
    Kim, Pankoo
    MOBILE INFORMATION SYSTEMS, 2021, 2021