Finding Recurrent Out-of-Vocabulary Words

被引:0
|
作者
Qin, Long [1 ]
Rudnicky, Alexander [1 ]
机构
[1] Carnegie Mellon Univ, Language Technol Inst, Pittsburgh, PA 15213 USA
关键词
OOV word detection; distributed evidence; bottom-up clustering;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Out-of-vocabulary (OOV) words can appear more than once in a conversation or over a period of time. Such multiple instances of the same OOV word provide valuable information for estimating the pronunciation or the part-of:speech (POS) tag of the word. But in a conventional OOV word detection system, each OOV word is recognized and treated individually. We therefore investigated how to identify recurrent OOV words in speech recognition. Specifically, we propose to cluster multiple instances of the same OOV word using a bottom-up approach. Phonetic, acoustic and contextual features were collected to measure the distance between OOV candidates. The experimental results show that the bottom-up clustering approach is very effective at detecting the recurrence of OOV words. We also found that the phonetic feature is better than the acoustic and contextual features, and the best performance is achieved when combining all features.
引用
收藏
页码:2241 / 2245
页数:5
相关论文
共 50 条
  • [41] Class-Based N-Gram Language Model for New Words Using Out-of-Vocabulary to In-Vocabulary Similarity
    Naptali, Welly
    Tsuchiya, Masatoshi
    Nakagawa, Seiichi
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2012, E95D (09) : 2308 - 2317
  • [42] Paraphrasing Out-of-Vocabulary Words with Word Embeddings and Semantic Lexicons for Low Resource Statistical Machine Translation
    Chu, Chenhui
    Kurohashi, Sadao
    LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 644 - 648
  • [43] USING SYNTHETIC AUDIO TO IMPROVE THE RECOGNITION OF OUT-OF-VOCABULARY WORDS IN END-TO-END ASR SYSTEMS
    Zheng, Xianrui
    Liu, Yulan
    Gunceler, Deniz
    Willett, Daniel
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5674 - 5678
  • [44] Handling Out-Of-Vocabulary Problem in Hangeul Word Embeddings
    Kwon, Ohjoon
    Kim, Dohyun
    Lee, Soo-Ryeon
    Choi, Junyoung
    Lee, SangKeun
    16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 3213 - 3221
  • [45] PatchBERT: Just-in-Time, Out-of-Vocabulary Patching
    Moon, Sangwhan
    Okazaki, Naoaki
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 7846 - 7852
  • [46] Out-of-vocabulary rejection based on selective attention model
    Park, KY
    Lee, SY
    NEURAL PROCESSING LETTERS, 2000, 12 (01) : 41 - 48
  • [47] Out-of-Vocabulary Rejection based on Selective Attention Model
    Ki-Young Park
    Soo-Young Lee
    Neural Processing Letters, 2000, 12 : 41 - 48
  • [48] Triplet Confidence for Robust Out-of-vocabulary Keyword Spotting
    Wang, Chengliang
    Hao, Yujie
    Wu, Xing
    Liao, Chao
    2022 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS 22), 2022, : 3130 - 3134
  • [49] Subword RNNLM Approximations for Out-Of-Vocabulary Keyword Search
    Singh, Mittul
    Virpioja, Sami
    Smit, Peter
    Kurimo, Mikko
    INTERSPEECH 2019, 2019, : 4235 - 4239
  • [50] An improved two-stage mixed language model approach for handling out-of-vocabulary words in large vocabulary continuous speech recognition
    Reveil, Bert
    Demuynck, Kris
    Martens, Jean-Pierre
    COMPUTER SPEECH AND LANGUAGE, 2014, 28 (01): : 141 - 162