A Spoken Term Detection Framework for Recovering Out-of-Vocabulary Words Using the Web

被引:0
|
作者
Parada, Carolina [1 ]
Sethy, Abhinav [2 ]
Dredze, Mark [1 ]
Jelinek, Frederick [1 ]
机构
[1] Johns Hopkins Univ, Ctr Language & Speech Proc, Human Language Technol Ctr Excellence, 3400 N Charles St, Baltimore, MD 21210 USA
[2] IBM TJ Watson Res Ctr, New York, NY 10598 USA
关键词
language modeling; data selection; spoken term detection; oov detection;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Vocabulary restrictions in large vocabulary continuous speech recognition (LVCSR) systems mean that out-of-vocabulary (OOV) words are lost in the output. However, OOV words tend to be information rich terms (often named entities) and their omission from the transcript negatively affects both usability and downstream NLP technologies, such as machine translation or knowledge distillation. We propose a novel approach to OOV recovery that uses a spoken term detection (STD) framework. Given an identified OOV region in the LVCSR output, we recover the uttered OOVs by utilizing contextual information and the vast and constantly updated vocabulary on the Web. Discovered words are integrated into system output, recovering up to 40% of OOVs and resulting in a reduction in system error.
引用
收藏
页码:1269 / +
页数:2
相关论文
共 50 条
  • [41] Online PLSA: Batch Updating Techniques Including Out-of-Vocabulary Words
    Bassiou, Nikoletta K.
    Kotropoulos, Constantine L.
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2014, 25 (11) : 1953 - 1966
  • [42] A two-pass approach for handling out-of-vocabulary words in a large vocabulary recognition task
    Scharenborg, Odette
    Seneff, Stephanie
    Boves, Lou
    COMPUTER SPEECH AND LANGUAGE, 2007, 21 (01): : 206 - 218
  • [43] Querying out-of-vocabulary words in lexicon-based keyword spotting
    Joan Puigcerver
    Alejandro H. Toselli
    Enrique Vidal
    Neural Computing and Applications, 2017, 28 : 2373 - 2382
  • [44] Querying out-of-vocabulary words in lexicon-based keyword spotting
    Puigcerver, Joan
    Toselli, Alejandro H.
    Vidal, Enrique
    NEURAL COMPUTING & APPLICATIONS, 2017, 28 (09): : 2373 - 2382
  • [45] Generating complementary acoustic model spaces in DNN-based sequence-to-frame DTW scheme for out-of-vocabulary spoken term detection
    Lee, Shi-wook
    Tanaka, Kazuyo
    Itoh, Yoshiaki
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 755 - 759
  • [46] Improved Neural Bag-of-Words Model to Retrieve Out-of-Vocabulary Words in Speech Recognition
    Sheikh, Imran
    Illina, Irina
    Fohr, Dominique
    Linares, Georges
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 675 - 679
  • [47] Out-Of-Vocabulary Words Recognition Based on Conditional Random Field in Electronic Commerce
    Yang, Yanfeng
    Yang, Yanqin
    Guan, Hu
    Xu, Wenchao
    NEURAL INFORMATION PROCESSING (ICONIP 2014), PT II, 2014, 8835 : 532 - 539
  • [48] Exploring Edit Distance for Normalising Out-of-Vocabulary Malay Words on Social Media
    Athirah, Raja Roza
    Soon, Lay-Ki
    Haw, Su-Cheng
    ENGINEERING APPLICATION OF ARTIFICIAL INTELLIGENCE CONFERENCE 2018 (EAAIC 2018), 2019, 255
  • [49] Variable-Span Out-of-Vocabulary Named Entity Detection
    Chen, Wei
    Ananthakrishnan, Sankaranarayanan
    Prasad, Rohit
    Natarajan, Prem
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 3728 - 3732
  • [50] USING SYNTHETIC AUDIO TO IMPROVE THE RECOGNITION OF OUT-OF-VOCABULARY WORDS IN END-TO-END ASR SYSTEMS
    Zheng, Xianrui
    Liu, Yulan
    Gunceler, Deniz
    Willett, Daniel
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5674 - 5678