RNN Language Model Estimation for Out-of-Vocabulary Words

被引:0
|
作者
Illina, Irina [1 ]
Fohr, Dominique [1 ]
机构
[1] Univ Lorraine, CNRS, INRIA, LORIA,MultiSpeech Team, F-54000 Nancy, France
来源
HUMAN LANGUAGE TECHNOLOGY. CHALLENGES FOR COMPUTER SCIENCE AND LINGUISTICS, LTC 2017 | 2020年 / 12598卷
关键词
Speech recognition; Neural networks; Vocabulary extension; Out-of-vocabulary words; Proper names;
D O I
10.1007/978-3-030-66527-2_15
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
One important issue of speech recognition systems is Out-of Vocabulary words (OOV). These words, often proper nouns or new words, are essential for documents to be transcribed correctly. Thus, they must be integrated in the language model (LM) and the lexicon of the speech recognition system. This article proposes new approaches to OOV proper noun probability estimation using Recurrent Neural Network Language Model (RNNLM). The proposed approaches are based on the notion of closest in-vocabulary (IV) words (list of brothers) to a given OOV proper noun. The probabilities of these words are used to estimate the probabilities of OOV proper nouns thanks to RNNLM. Three methods for retrieving the relevant list of brothers are studied. The main advantages of the proposed approaches are that the RNNLM is not retrained and the architecture of the RNNLM is kept intact. Experiments on real text data from the website of the Euronews channel show relative perplexity reductions of about 14% compared to baseline RNNLM.
引用
收藏
页码:199 / 211
页数:13
相关论文
共 50 条
  • [21] Multi-level out-of-vocabulary words handling approach
    Lochter, Johannes V.
    Silva, Renato M.
    Almeida, Tiago A.
    KNOWLEDGE-BASED SYSTEMS, 2022, 251
  • [22] Exploiting Out-of-Vocabulary Words for Out-of-Domain Detection in Dialog Systems
    Ryu, Seonghan
    Lee, Donghyeon
    Lee, Gary Geunbae
    Kim, Kyungduk
    Noh, Hyungjong
    2014 INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP), 2014, : 165 - +
  • [23] Phoneme-to-grapheme conversion for out-of-vocabulary words in large vocabulary speech recognition
    Decadt, B
    Duchateau, J
    Daelemans, W
    Wambacq, P
    ASRU 2001: IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, CONFERENCE PROCEEDINGS, 2001, : 413 - 416
  • [24] A Large Corpus of Product Reviews in Portuguese: Tackling Out-Of-Vocabulary Words
    Hartmann, Nathan S.
    Avanco, Lucas V.
    Balage, Pedro P.
    Duran, Magali S.
    Nunes, Maria G. V.
    Pardo, Thiago A. S.
    Aluisio, Sandra M.
    LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 3865 - 3871
  • [25] Online PLSA: Batch Updating Techniques Including Out-of-Vocabulary Words
    Bassiou, Nikoletta K.
    Kotropoulos, Constantine L.
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2014, 25 (11) : 1953 - 1966
  • [26] A two-pass approach for handling out-of-vocabulary words in a large vocabulary recognition task
    Scharenborg, Odette
    Seneff, Stephanie
    Boves, Lou
    COMPUTER SPEECH AND LANGUAGE, 2007, 21 (01): : 206 - 218
  • [27] Querying out-of-vocabulary words in lexicon-based keyword spotting
    Joan Puigcerver
    Alejandro H. Toselli
    Enrique Vidal
    Neural Computing and Applications, 2017, 28 : 2373 - 2382
  • [28] FastContext: Handling Out-of-Vocabulary Words Using the Word Structure and Context
    Silva, Renato M.
    Lochter, Johannes, V
    Almeida, Tiago A.
    Yamakami, Akebo
    INTELLIGENT SYSTEMS, PT II, 2022, 13654 : 539 - 557
  • [29] Querying out-of-vocabulary words in lexicon-based keyword spotting
    Puigcerver, Joan
    Toselli, Alejandro H.
    Vidal, Enrique
    NEURAL COMPUTING & APPLICATIONS, 2017, 28 (09): : 2373 - 2382
  • [30] Robust Backed-off Estimation of Out-of-Vocabulary Embeddings
    Fukuda, Nobukazu
    Yoshinaga, Naoki
    Kitsuregawa, Masaru
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 4827 - 4838