Predicting the out-of-vocabulary rate and the required vocabulary size for speech processing applications

被引:0
|
作者
Muller, J
Stahl, H
Lang, M
机构
关键词
out-of-vocabulary rate; OOV-rate; vocabulary size; text corpus; test corpus; training corpus;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper describes an approach for predicting both the vocabulary size and the resulting out-of-vocabulary rate (OOV-rate) for a hypothetical extension of an existing text corpus. By splitting the original corpus into two different sub-corpora vocabulary and OOV-rate can be determined for that special constellation. Average values art calculated for all combinations of sub-corpora and can be approximated by analytic function terms. These functions enable the easy prediction of the vocabulary size and the OOV-rate. The prediction accuracy results in a relative error below 4.6%.
引用
收藏
页码:1922 / 1925
页数:4
相关论文
共 50 条
  • [31] An improved two-stage mixed language model approach for handling out-of-vocabulary words in large vocabulary continuous speech recognition
    Reveil, Bert
    Demuynck, Kris
    Martens, Jean-Pierre
    COMPUTER SPEECH AND LANGUAGE, 2014, 28 (01): : 141 - 162
  • [32] Improved Neural Bag-of-Words Model to Retrieve Out-of-Vocabulary Words in Speech Recognition
    Sheikh, Imran
    Illina, Irina
    Fohr, Dominique
    Linares, Georges
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 675 - 679
  • [33] Speaker-independent name dialing with out-of-vocabulary rejection
    Ramalingam, CS
    Netsch, L
    Kao, YH
    1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 1475 - 1478
  • [34] Local methods for on-demand out-of-vocabulary word retrieval
    Oger, Stanislas
    Linares, Georges
    Bechet, Frederic
    SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 767 - 771
  • [35] Impact of Out-of-Vocabulary Words on the Twitter Experience of Blind Users
    Lee, Hae-Na
    Ashok, Vikas
    PROCEEDINGS OF THE 2022 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS (CHI' 22), 2022,
  • [36] Direct Posterior Confidence for Out-of-Vocabulary Spoken Term Detection
    Wang, Dong
    King, Simon
    Frankel, Joe
    Vipperla, Ravichander
    Evans, Nicholas
    Troncy, Raphael
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2012, 30 (03)
  • [37] Stochastic Pronunciation Modeling for Out-of-Vocabulary Spoken Term Detection
    Wang, Dong
    King, Simon
    Frankel, Joe
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (04): : 688 - 698
  • [38] Recurrent Out-of-Vocabulary Word Detection Using Distribution of Features
    Asami, Taichi
    Masumura, Ryo
    Aono, Yushi
    Shinoda, Koichi
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1320 - 1324
  • [39] Inkball Models for Character Localization and Out-of-Vocabulary Word Spotting
    Howe, Nicholas R.
    2015 13TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2015, : 381 - 385
  • [40] Improving Abstractive Summarization by Training Masked Out-of-Vocabulary Words
    Lee, Tae-Seok
    Lee, Hyun-Young
    Kang, Seung-Shik
    JOURNAL OF INFORMATION PROCESSING SYSTEMS, 2022, 18 (03): : 344 - 358