Predicting the out-of-vocabulary rate and the required vocabulary size for speech processing applications

被引:0
|
作者
Muller, J
Stahl, H
Lang, M
机构
关键词
out-of-vocabulary rate; OOV-rate; vocabulary size; text corpus; test corpus; training corpus;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper describes an approach for predicting both the vocabulary size and the resulting out-of-vocabulary rate (OOV-rate) for a hypothetical extension of an existing text corpus. By splitting the original corpus into two different sub-corpora vocabulary and OOV-rate can be determined for that special constellation. Average values art calculated for all combinations of sub-corpora and can be approximated by analytic function terms. These functions enable the easy prediction of the vocabulary size and the OOV-rate. The prediction accuracy results in a relative error below 4.6%.
引用
收藏
页码:1922 / 1925
页数:4
相关论文
共 50 条
  • [1] Phoneme-to-grapheme conversion for out-of-vocabulary words in large vocabulary speech recognition
    Decadt, B
    Duchateau, J
    Daelemans, W
    Wambacq, P
    ASRU 2001: IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, CONFERENCE PROCEEDINGS, 2001, : 413 - 416
  • [2] OUT-OF-VOCABULARY WORD DETECTION IN A SPEECH-TO-SPEECH TRANSLATION SYSTEM
    Kuo, Hong-Kwang
    Kislal, Ellen Eide
    Mangu, Lidia
    Soltau, Hagen
    Beran, Tomas
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [3] Transcription of out-of-vocabulary words in large vocabulary speech recognition based on phoneme-to-grapheme conversion
    Decadt, B
    Duchateau, J
    Daelemans, W
    Wambacq, P
    2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 861 - 864
  • [4] Out-of-Vocabulary Word Detection and Beyond
    Kombrink, Stefan
    Hannemann, Mirko
    Burget, Lukas
    DETECTION AND IDENTIFICATION OF RARE AUDIOVISUAL CUES, 2012, 384 : 57 - 65
  • [5] Dynamic out-of-vocabulary word registration to language model for speech recognition
    Norihide Kitaoka
    Bohan Chen
    Yuya Obashi
    EURASIP Journal on Audio, Speech, and Music Processing, 2021
  • [6] Finding Recurrent Out-of-Vocabulary Words
    Qin, Long
    Rudnicky, Alexander
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 2241 - 2245
  • [7] Dynamic out-of-vocabulary word registration to language model for speech recognition
    Kitaoka, Norihide
    Chen, Bohan
    Obashi, Yuya
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2021, 2021 (01)
  • [8] Improving out-of-vocabulary name resolution
    Palmer, DD
    Ostendorf, M
    COMPUTER SPEECH AND LANGUAGE, 2005, 19 (01): : 107 - 128
  • [9] COPING WITH OUT-OF-VOCABULARY WORDS: OPEN VERSUS HUGE VOCABULARY ASR
    Gerosa, Matteo
    Federico, Marcello
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4313 - 4316
  • [10] Enhancing Out-of-Vocabulary Estimation with Subword Attention
    Patel, Raj
    Domeniconi, Carlotta
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 3592 - 3601