Finding Recurrent Out-of-Vocabulary Words

被引:0
|
作者
Qin, Long [1 ]
Rudnicky, Alexander [1 ]
机构
[1] Carnegie Mellon Univ, Language Technol Inst, Pittsburgh, PA 15213 USA
关键词
OOV word detection; distributed evidence; bottom-up clustering;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Out-of-vocabulary (OOV) words can appear more than once in a conversation or over a period of time. Such multiple instances of the same OOV word provide valuable information for estimating the pronunciation or the part-of:speech (POS) tag of the word. But in a conventional OOV word detection system, each OOV word is recognized and treated individually. We therefore investigated how to identify recurrent OOV words in speech recognition. Specifically, we propose to cluster multiple instances of the same OOV word using a bottom-up approach. Phonetic, acoustic and contextual features were collected to measure the distance between OOV candidates. The experimental results show that the bottom-up clustering approach is very effective at detecting the recurrence of OOV words. We also found that the phonetic feature is better than the acoustic and contextual features, and the best performance is achieved when combining all features.
引用
收藏
页码:2241 / 2245
页数:5
相关论文
共 50 条
  • [21] Querying out-of-vocabulary words in lexicon-based keyword spotting
    Joan Puigcerver
    Alejandro H. Toselli
    Enrique Vidal
    Neural Computing and Applications, 2017, 28 : 2373 - 2382
  • [22] FastContext: Handling Out-of-Vocabulary Words Using the Word Structure and Context
    Silva, Renato M.
    Lochter, Johannes, V
    Almeida, Tiago A.
    Yamakami, Akebo
    INTELLIGENT SYSTEMS, PT II, 2022, 13654 : 539 - 557
  • [23] Querying out-of-vocabulary words in lexicon-based keyword spotting
    Puigcerver, Joan
    Toselli, Alejandro H.
    Vidal, Enrique
    NEURAL COMPUTING & APPLICATIONS, 2017, 28 (09): : 2373 - 2382
  • [24] Improved Neural Bag-of-Words Model to Retrieve Out-of-Vocabulary Words in Speech Recognition
    Sheikh, Imran
    Illina, Irina
    Fohr, Dominique
    Linares, Georges
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 675 - 679
  • [25] Out-Of-Vocabulary Words Recognition Based on Conditional Random Field in Electronic Commerce
    Yang, Yanfeng
    Yang, Yanqin
    Guan, Hu
    Xu, Wenchao
    NEURAL INFORMATION PROCESSING (ICONIP 2014), PT II, 2014, 8835 : 532 - 539
  • [26] Chinese Word Segmentation and Out-Of-Vocabulary Words Detection Using Suffix Array
    Ji Wenyan
    Peng Tao
    Zuo Wanli
    He Fengling
    Zhu Huifeng
    WISM: 2009 INTERNATIONAL CONFERENCE ON WEB INFORMATION SYSTEMS AND MINING, PROCEEDINGS, 2009, : 56 - 60
  • [27] Exploring Edit Distance for Normalising Out-of-Vocabulary Malay Words on Social Media
    Athirah, Raja Roza
    Soon, Lay-Ki
    Haw, Su-Cheng
    ENGINEERING APPLICATION OF ARTIFICIAL INTELLIGENCE CONFERENCE 2018 (EAAIC 2018), 2019, 255
  • [28] SPEECH RECOGNITION OF FOREIGN OUT-OF-VOCABULARY WORDS USING A HIERARCHICAL LANGUAGE MODEL
    Yamamoto, Hirofumi
    Kikui, Genichiro
    Nakamura, Satoshi
    Sagisaka, Yoshinori
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1870 - +
  • [29] A Spoken Term Detection Framework for Recovering Out-of-Vocabulary Words Using the Web
    Parada, Carolina
    Sethy, Abhinav
    Dredze, Mark
    Jelinek, Frederick
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 1269 - +
  • [30] Transcription of out-of-vocabulary words in large vocabulary speech recognition based on phoneme-to-grapheme conversion
    Decadt, B
    Duchateau, J
    Daelemans, W
    Wambacq, P
    2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 861 - 864