A Spoken Term Detection Framework for Recovering Out-of-Vocabulary Words Using the Web

被引：0

作者：

Parada, Carolina ^{[1
]}

Sethy, Abhinav ^{[2
]}

Dredze, Mark ^{[1
]}

Jelinek, Frederick ^{[1
]}

机构：

[1] Johns Hopkins Univ, Ctr Language & Speech Proc, Human Language Technol Ctr Excellence, 3400 N Charles St, Baltimore, MD 21210 USA

[2] IBM TJ Watson Res Ctr, New York, NY 10598 USA

来源：

11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2 | 2010年

关键词：

language modeling; data selection; spoken term detection; oov detection;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Vocabulary restrictions in large vocabulary continuous speech recognition (LVCSR) systems mean that out-of-vocabulary (OOV) words are lost in the output. However, OOV words tend to be information rich terms (often named entities) and their omission from the transcript negatively affects both usability and downstream NLP technologies, such as machine translation or knowledge distillation. We propose a novel approach to OOV recovery that uses a spoken term detection (STD) framework. Given an identified OOV region in the LVCSR output, we recover the uttered OOVs by utilizing contextual information and the vast and constantly updated vocabulary on the Web. Discovered words are integrated into system output, recovering up to 40% of OOVs and resulting in a reduction in system error.

引用

页码：1269 / +

页数：2

共 50 条

[41] Online PLSA: Batch Updating Techniques Including Out-of-Vocabulary Words
Bassiou, Nikoletta K.
Kotropoulos, Constantine L.
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2014, 25 (11) : 1953 - 1966
[42] A two-pass approach for handling out-of-vocabulary words in a large vocabulary recognition task
Scharenborg, Odette
Seneff, Stephanie
Boves, Lou
COMPUTER SPEECH AND LANGUAGE, 2007, 21 (01): : 206 - 218
[43] Querying out-of-vocabulary words in lexicon-based keyword spotting
Joan Puigcerver
Alejandro H. Toselli
Enrique Vidal
Neural Computing and Applications, 2017, 28 : 2373 - 2382
[44] Querying out-of-vocabulary words in lexicon-based keyword spotting
Puigcerver, Joan
Toselli, Alejandro H.
Vidal, Enrique
NEURAL COMPUTING & APPLICATIONS, 2017, 28 (09): : 2373 - 2382
[45] Generating complementary acoustic model spaces in DNN-based sequence-to-frame DTW scheme for out-of-vocabulary spoken term detection
Lee, Shi-wook
Tanaka, Kazuyo
Itoh, Yoshiaki
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 755 - 759
[46] Improved Neural Bag-of-Words Model to Retrieve Out-of-Vocabulary Words in Speech Recognition
Sheikh, Imran
Illina, Irina
Fohr, Dominique
Linares, Georges
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 675 - 679
[47] Out-Of-Vocabulary Words Recognition Based on Conditional Random Field in Electronic Commerce
Yang, Yanfeng
Yang, Yanqin
Guan, Hu
Xu, Wenchao
NEURAL INFORMATION PROCESSING (ICONIP 2014), PT II, 2014, 8835 : 532 - 539
[48] Exploring Edit Distance for Normalising Out-of-Vocabulary Malay Words on Social Media
Athirah, Raja Roza
Soon, Lay-Ki
Haw, Su-Cheng
ENGINEERING APPLICATION OF ARTIFICIAL INTELLIGENCE CONFERENCE 2018 (EAAIC 2018), 2019, 255
[49] Variable-Span Out-of-Vocabulary Named Entity Detection
Chen, Wei
Ananthakrishnan, Sankaranarayanan
Prasad, Rohit
Natarajan, Prem
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 3728 - 3732
[50] USING SYNTHETIC AUDIO TO IMPROVE THE RECOGNITION OF OUT-OF-VOCABULARY WORDS IN END-TO-END ASR SYSTEMS
Zheng, Xianrui
Liu, Yulan
Gunceler, Deniz
Willett, Daniel
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5674 - 5678

← 1 2 3 4 5 →