A Spoken Term Detection Framework for Recovering Out-of-Vocabulary Words Using the Web

被引:0
|
作者
Parada, Carolina [1 ]
Sethy, Abhinav [2 ]
Dredze, Mark [1 ]
Jelinek, Frederick [1 ]
机构
[1] Johns Hopkins Univ, Ctr Language & Speech Proc, Human Language Technol Ctr Excellence, 3400 N Charles St, Baltimore, MD 21210 USA
[2] IBM TJ Watson Res Ctr, New York, NY 10598 USA
关键词
language modeling; data selection; spoken term detection; oov detection;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Vocabulary restrictions in large vocabulary continuous speech recognition (LVCSR) systems mean that out-of-vocabulary (OOV) words are lost in the output. However, OOV words tend to be information rich terms (often named entities) and their omission from the transcript negatively affects both usability and downstream NLP technologies, such as machine translation or knowledge distillation. We propose a novel approach to OOV recovery that uses a spoken term detection (STD) framework. Given an identified OOV region in the LVCSR output, we recover the uttered OOVs by utilizing contextual information and the vast and constantly updated vocabulary on the Web. Discovered words are integrated into system output, recovering up to 40% of OOVs and resulting in a reduction in system error.
引用
收藏
页码:1269 / +
页数:2
相关论文
共 50 条
  • [1] Direct Posterior Confidence for Out-of-Vocabulary Spoken Term Detection
    Wang, Dong
    King, Simon
    Frankel, Joe
    Vipperla, Ravichander
    Evans, Nicholas
    Troncy, Raphael
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2012, 30 (03)
  • [2] Stochastic Pronunciation Modeling for Out-of-Vocabulary Spoken Term Detection
    Wang, Dong
    King, Simon
    Frankel, Joe
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (04): : 688 - 698
  • [3] Term-Dependent Confidence Normalisation for Out-of-Vocabulary Spoken Term Detection
    Javier Tejedo
    Simon King
    Joe Frankel
    Journal of Computer Science & Technology, 2012, 27 (02) : 358 - 375
  • [4] Term-Dependent Confidence Normalisation for Out-of-Vocabulary Spoken Term Detection
    Wang, Dong
    Tejedor, Javier
    King, Simon
    Frankel, Joe
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2012, 27 (02) : 358 - 375
  • [5] Term-Dependent Confidence Normalisation for Out-of-Vocabulary Spoken Term Detection
    Dong Wang
    Javier Tejedor
    Simon King
    Joe Frankel
    Journal of Computer Science and Technology, 2012, 27 : 358 - 375
  • [6] STOCHASTIC PRONUNCIATION MODELLING AND SOFT MATCH FOR OUT-OF-VOCABULARY SPOKEN TERM DETECTION
    Wang, Dong
    King, Simon
    Frankel, Joe
    Bell, Peter
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 5294 - 5297
  • [7] Detection of Out-of-Vocabulary Words in Posterior Based ASR
    Ketabdar, Hamed
    Hannemann, Mirko
    Hermansky, Hynek
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2772 - 2775
  • [8] CRF-based Stochastic Pronunciation Modeling for Out-of-Vocabulary Spoken Term Detection
    Wang, Dong
    King, Simon
    Evans, Nicholas
    Troncy, Raphael
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 1668 - +
  • [9] Addressing the Out-Of-Vocabulary Problem for Large-Scale Chinese Spoken Term Detection
    Meng, Sha
    Shao, Jian
    Yu, Roger Peng
    Liu, Jia
    Seide, Frank
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 2146 - +
  • [10] Chinese Word Segmentation and Out-Of-Vocabulary Words Detection Using Suffix Array
    Ji Wenyan
    Peng Tao
    Zuo Wanli
    He Fengling
    Zhu Huifeng
    WISM: 2009 INTERNATIONAL CONFERENCE ON WEB INFORMATION SYSTEMS AND MINING, PROCEEDINGS, 2009, : 56 - 60