Predicting the out-of-vocabulary rate and the required vocabulary size for speech processing applications

被引：0

作者：

Muller, J

Stahl, H

Lang, M

机构：

来源：

ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4 | 1996年

关键词：

out-of-vocabulary rate; OOV-rate; vocabulary size; text corpus; test corpus; training corpus;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper describes an approach for predicting both the vocabulary size and the resulting out-of-vocabulary rate (OOV-rate) for a hypothetical extension of an existing text corpus. By splitting the original corpus into two different sub-corpora vocabulary and OOV-rate can be determined for that special constellation. Average values art calculated for all combinations of sub-corpora and can be approximated by analytic function terms. These functions enable the easy prediction of the vocabulary size and the OOV-rate. The prediction accuracy results in a relative error below 4.6%.

引用

页码：1922 / 1925

页数：4

共 50 条

[1] Phoneme-to-grapheme conversion for out-of-vocabulary words in large vocabulary speech recognition
Decadt, B
Duchateau, J
Daelemans, W
Wambacq, P
ASRU 2001: IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, CONFERENCE PROCEEDINGS, 2001, : 413 - 416
[2] OUT-OF-VOCABULARY WORD DETECTION IN A SPEECH-TO-SPEECH TRANSLATION SYSTEM
Kuo, Hong-Kwang
Kislal, Ellen Eide
Mangu, Lidia
Soltau, Hagen
Beran, Tomas
2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[3] Transcription of out-of-vocabulary words in large vocabulary speech recognition based on phoneme-to-grapheme conversion
Decadt, B
Duchateau, J
Daelemans, W
Wambacq, P
2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 861 - 864
[4] Out-of-Vocabulary Word Detection and Beyond
Kombrink, Stefan
Hannemann, Mirko
Burget, Lukas
DETECTION AND IDENTIFICATION OF RARE AUDIOVISUAL CUES, 2012, 384 : 57 - 65
[5] Dynamic out-of-vocabulary word registration to language model for speech recognition
Norihide Kitaoka
Bohan Chen
Yuya Obashi
EURASIP Journal on Audio, Speech, and Music Processing, 2021
[6] Finding Recurrent Out-of-Vocabulary Words
Qin, Long
Rudnicky, Alexander
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 2241 - 2245
[7] Dynamic out-of-vocabulary word registration to language model for speech recognition
Kitaoka, Norihide
Chen, Bohan
Obashi, Yuya
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2021, 2021 (01)
[8] Improving out-of-vocabulary name resolution
Palmer, DD
Ostendorf, M
COMPUTER SPEECH AND LANGUAGE, 2005, 19 (01): : 107 - 128
[9] COPING WITH OUT-OF-VOCABULARY WORDS: OPEN VERSUS HUGE VOCABULARY ASR
Gerosa, Matteo
Federico, Marcello
2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4313 - 4316
[10] Enhancing Out-of-Vocabulary Estimation with Subword Attention
Patel, Raj
Domeniconi, Carlotta
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 3592 - 3601

← 1 2 3 4 5 →