Direct Posterior Confidence for Out-of-Vocabulary Spoken Term Detection

被引：6

作者：

Wang, Dong ^{[1
]}

King, Simon ^{[2
]}

Frankel, Joe ^{[2
]}

Vipperla, Ravichander ^{[3
]}

Evans, Nicholas ^{[3
]}

Troncy, Raphael ^{[3
]}

机构：

[1] Nuance Commun, Aachen, Germany

[2] Univ Edinburgh, CSTR, Edinburgh EH8 9AB, Midlothian, Scotland

[3] EURECOM, Multimedia Dept, F-06904 Sophia Antipolis, France

来源：

ACM TRANSACTIONS ON INFORMATION SYSTEMS | 2012年 / 30卷 / 03期

基金：

英国工程与自然科学研究理事会;

关键词：

Speech recognition; spontaneous speech search; spoken term detection; DISCRIMINATIVE UTTERANCE VERIFICATION; SPEECH RECOGNITION; MINIMUM VERIFICATION; OOV QUERIES; WORD; PHONE; SYSTEM; ERROR; RETRIEVAL; SEARCH;

D O I：

10.1145/2328967.2328969

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this article, we first develop an extensive discussion about the modeling weakness problem associated with OOV terms, and then propose our approach to address this problem based on direct poster confidence. Our experiments carried out on spontaneous and conversational multiparty meeting speech, demonstrate that the proposed technique provides a significant improvement in STD performance as compared to conventional lattice-based confidence, in particular for OOV terms. Furthermore, the new confidence estimation approach is fused with other advanced techniques for OOV treatment, such as stochastic pronunciation modeling and discriminative confidence normalization. This leads to an integrated solution for OOV term detection that results in a large performance improvement.Spoken term detection (STD) is a key technology for spoken information retrieval. As compared to the conventional speech transcription and keyword spotting, STD is an open-vocabulary task and has to address out-of-vocabulary (OOV) terms. Approaches based on subword units, for example phones, are widely used to solve the OOV issue; however, performance on OOV terms is still substantially inferior to that of in-vocabulary (INV) terms. The performance degradation on OOV terms can be attributed to a multitude of factors. One particular factor we address in this article is the unreliable confidence estimation caused by weak acoustic and language modeling due to the absence of OOV terms in the training corpora. We propose a direct posterior confidence derived from a discriminative model, such as multilayer perceptron (MLP). The new confidence considers a wide-range acoustic context which is usually important for speech recognition and retrieval; moreover, it localizes on detected speech segments and therefore avoids the impact of long-span word context which is usually unreliable for OOV term detection.

引用

页数：34

共 50 条

[1] Term-Dependent Confidence Normalisation for Out-of-Vocabulary Spoken Term Detection
Javier Tejedo
Simon King
Joe Frankel
Journal of Computer Science & Technology, 2012, 27 (02) : 358 - 375
[2] Term-Dependent Confidence Normalisation for Out-of-Vocabulary Spoken Term Detection
Wang, Dong
Tejedor, Javier
King, Simon
Frankel, Joe
JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2012, 27 (02) : 358 - 375
[3] Term-Dependent Confidence Normalisation for Out-of-Vocabulary Spoken Term Detection
Dong Wang
Javier Tejedor
Simon King
Joe Frankel
Journal of Computer Science and Technology, 2012, 27 : 358 - 375
[4] Term-Dependent Confidence for Out-of-Vocabulary Term Detection
Wang, Dong
King, Simon
Frankel, Joe
Bell, Peter
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2103 - 2106
[5] Stochastic Pronunciation Modeling for Out-of-Vocabulary Spoken Term Detection
Wang, Dong
King, Simon
Frankel, Joe
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (04): : 688 - 698
[6] STOCHASTIC PRONUNCIATION MODELLING AND SOFT MATCH FOR OUT-OF-VOCABULARY SPOKEN TERM DETECTION
Wang, Dong
King, Simon
Frankel, Joe
Bell, Peter
2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 5294 - 5297
[7] A Spoken Term Detection Framework for Recovering Out-of-Vocabulary Words Using the Web
Parada, Carolina
Sethy, Abhinav
Dredze, Mark
Jelinek, Frederick
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 1269 - +
[8] Confidence measure based on forced-alignment for out-of-vocabulary term detection
Han, J. (jqhan@hit.edu.com), 2013, Binary Information Press, P.O. Box 162, Bethel, CT 06801-0162, United States (09):
[9] CRF-based Stochastic Pronunciation Modeling for Out-of-Vocabulary Spoken Term Detection
Wang, Dong
King, Simon
Evans, Nicholas
Troncy, Raphael
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 1668 - +
[10] Addressing the Out-Of-Vocabulary Problem for Large-Scale Chinese Spoken Term Detection
Meng, Sha
Shao, Jian
Yu, Roger Peng
Liu, Jia
Seide, Frank
INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 2146 - +

← 1 2 3 4 5 →