Finding Recurrent Out-of-Vocabulary Words

被引:0
|
作者
Qin, Long [1 ]
Rudnicky, Alexander [1 ]
机构
[1] Carnegie Mellon Univ, Language Technol Inst, Pittsburgh, PA 15213 USA
关键词
OOV word detection; distributed evidence; bottom-up clustering;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Out-of-vocabulary (OOV) words can appear more than once in a conversation or over a period of time. Such multiple instances of the same OOV word provide valuable information for estimating the pronunciation or the part-of:speech (POS) tag of the word. But in a conventional OOV word detection system, each OOV word is recognized and treated individually. We therefore investigated how to identify recurrent OOV words in speech recognition. Specifically, we propose to cluster multiple instances of the same OOV word using a bottom-up approach. Phonetic, acoustic and contextual features were collected to measure the distance between OOV candidates. The experimental results show that the bottom-up clustering approach is very effective at detecting the recurrence of OOV words. We also found that the phonetic feature is better than the acoustic and contextual features, and the best performance is achieved when combining all features.
引用
收藏
页码:2241 / 2245
页数:5
相关论文
共 50 条
  • [31] Out-of-Vocabulary Word Detection and Beyond
    Kombrink, Stefan
    Hannemann, Mirko
    Burget, Lukas
    DETECTION AND IDENTIFICATION OF RARE AUDIOVISUAL CUES, 2012, 384 : 57 - 65
  • [32] Incorporate web search technology to solve out-of-vocabulary words in Chinese word segmentation
    Qiao, Wei
    Sun, Maosong
    PACLIC 23 - Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation, 2009, 2 : 454 - 463
  • [33] Replacing Out-of-Vocabulary Words with an Appropriate Synonym Based on Word2VnCR
    Kim, Jeongin
    Hong, Taekeun
    Kim, Pankoo
    MOBILE INFORMATION SYSTEMS, 2021, 2021
  • [34] Improving out-of-vocabulary name resolution
    Palmer, DD
    Ostendorf, M
    COMPUTER SPEECH AND LANGUAGE, 2005, 19 (01): : 107 - 128
  • [35] Replacing Out-of-Vocabulary Words with an Appropriate Synonym Based on Word2VnCR
    Kim, Jeongin
    Hong, Taekeun
    Kim, Pankoo
    Mobile Information Systems, 2021, 2021
  • [36] Single-class Support Vector Machine for an Out-of-Vocabulary Rejection of Isolated Words
    He, Dongzhi
    Hou, Yibin
    Huang, Zhangqin
    Ding, Zhihao
    2009 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND BIOMIMETICS (ROBIO 2009), VOLS 1-4, 2009, : 1376 - 1380
  • [37] Enhancing Out-of-Vocabulary Estimation with Subword Attention
    Patel, Raj
    Domeniconi, Carlotta
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 3592 - 3601
  • [38] Out-of-vocabulary Words Detection with Attention and CTC Alignments in an End-to-End ASR System
    Egorova, Ekaterina
    Vydana, Hari Krishna
    Burget, Lukas
    Cernocky, Jan
    INTERSPEECH 2021, 2021, : 2901 - 2905
  • [39] English Out-of-Vocabulary Lexical Evaluation Task
    Wang, Han
    Wang, Ye
    Zhang, Xinxiang
    Lu, Mi
    Choe, Yoonsuck
    Cao, Jingjing
    2019 IEEE 17TH INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS (INDIN), 2019, : 1468 - 1472
  • [40] SYSTEM COMBINATION FOR OUT-OF-VOCABULARY WORD DETECTION
    Qin, Long
    Sun, Ming
    Rudnicky, Alexander
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4817 - 4820