A Spoken Term Detection Framework for Recovering Out-of-Vocabulary Words Using the Web

被引:0
|
作者
Parada, Carolina [1 ]
Sethy, Abhinav [2 ]
Dredze, Mark [1 ]
Jelinek, Frederick [1 ]
机构
[1] Johns Hopkins Univ, Ctr Language & Speech Proc, Human Language Technol Ctr Excellence, 3400 N Charles St, Baltimore, MD 21210 USA
[2] IBM TJ Watson Res Ctr, New York, NY 10598 USA
关键词
language modeling; data selection; spoken term detection; oov detection;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Vocabulary restrictions in large vocabulary continuous speech recognition (LVCSR) systems mean that out-of-vocabulary (OOV) words are lost in the output. However, OOV words tend to be information rich terms (often named entities) and their omission from the transcript negatively affects both usability and downstream NLP technologies, such as machine translation or knowledge distillation. We propose a novel approach to OOV recovery that uses a spoken term detection (STD) framework. Given an identified OOV region in the LVCSR output, we recover the uttered OOVs by utilizing contextual information and the vast and constantly updated vocabulary on the Web. Discovered words are integrated into system output, recovering up to 40% of OOVs and resulting in a reduction in system error.
引用
收藏
页码:1269 / +
页数:2
相关论文
共 50 条
  • [31] SYSTEM COMBINATION FOR OUT-OF-VOCABULARY WORD DETECTION
    Qin, Long
    Sun, Ming
    Rudnicky, Alexander
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4817 - 4820
  • [32] Few-Shot Representation Learning for Out-Of-Vocabulary Words
    Hu, Ziniu
    Chen, Ting
    Chang, Kai-Wei
    Sun, Yizhou
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 4102 - 4112
  • [33] Multi-level out-of-vocabulary words handling approach
    Lochter, Johannes V.
    Silva, Renato M.
    Almeida, Tiago A.
    KNOWLEDGE-BASED SYSTEMS, 2022, 251
  • [34] Confidence measure based on forced-alignment for out-of-vocabulary term detection
    Han, J. (jqhan@hit.edu.com), 2013, Binary Information Press, P.O. Box 162, Bethel, CT 06801-0162, United States (09):
  • [35] Using the Web to create dynamic dictionaries in handwritten out-of-vocabulary word recognition
    Oprean, Cristina
    Likforman-Sulem, Laurence
    Popescu, Adrian
    Mokbel, Chafic
    2013 12TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2013, : 989 - 993
  • [36] USING SYNTACTIC AND CONFUSION NETWORK STRUCTURE FOR OUT-OF-VOCABULARY WORD DETECTION
    Marin, Alex
    Kwiatkowski, Tom
    Ostendorf, Mari
    Zettlemoyer, Luke
    2012 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2012), 2012, : 159 - 164
  • [37] Phoneme-to-grapheme conversion for out-of-vocabulary words in large vocabulary speech recognition
    Decadt, B
    Duchateau, J
    Daelemans, W
    Wambacq, P
    ASRU 2001: IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, CONFERENCE PROCEEDINGS, 2001, : 413 - 416
  • [38] Out-of-vocabulary Words Detection with Attention and CTC Alignments in an End-to-End ASR System
    Egorova, Ekaterina
    Vydana, Hari Krishna
    Burget, Lukas
    Cernocky, Jan
    INTERSPEECH 2021, 2021, : 2901 - 2905
  • [39] Robust/Fast Out-of-Vocabulary Spoken Term Detection By N-gram Index with Exact Distance Through Text/Speech Input
    Sakamoto, Nagisa
    Nakagawa, Seiichi
    2013 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2013,
  • [40] A Large Corpus of Product Reviews in Portuguese: Tackling Out-Of-Vocabulary Words
    Hartmann, Nathan S.
    Avanco, Lucas V.
    Balage, Pedro P.
    Duran, Magali S.
    Nunes, Maria G. V.
    Pardo, Thiago A. S.
    Aluisio, Sandra M.
    LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 3865 - 3871