Zero-resource audio-only spoken term detection based on a combination of template matching techniques

被引：0

作者：

Muscariello, Armando ^{[1
]}

Gravier, Guillaume ^{[1
]}

Bimbot, Frederic ^{[1
]}

机构：

[1] IRISA CNRS UMR 6074, Paris, France

来源：

12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5 | 2011年

关键词：

spoken term detection; template matching; unsupervised learning; posterior features;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Spoken term detection is a well-known information retrieval task that seeks to extract contentful information from audio by locating occurrences of known query words of interest. This paper describes a zero-resource approach to such task based on pattern matching of spoken term queries at the acoustic level. The template matching module comprises the cascade of a segmental variant of dynamic time warping and a self-similarity matrix comparison to further improve robustness to speech variability. This solution notably differs from more traditional train and test methods that, while shown to be very accurate, rely upon the availability of large amounts of linguistic resources. We evaluate our framework on different parameterizations of the speech templates: raw MFCC features and Gaussian posteriorgrams, French and English phonetic posteriorgrams output by two different state of the art phoneme recognizers.

引用

页码：928 / 931

页数：4

共 25 条

[21] COMBINATION OF SYLLABLE BASED N-GRAM SEARCH AND WORD SEARCH FOR SPOKEN TERM DETECTION THROUGH SPOKEN QUERIES AND IV/OOV CLASSIFICATION
Sakamoto, Nagisa
Yamamoto, Kazumasa
Nakagawa, Seiichi
2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 200 - 206
[22] MULTIPLE INDEX COMBINATION FOR JAPANESE SPOKEN TERM DETECTION WITH OPTIMUM INDEX SELECTION BASED ON OOV-REGION CLASSIFIER
Kanda, Naoyuki
Itoyama, Katsutoshi
Okuno, Hiroshi G.
2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 8540 - 8544
[23] Multiple index combination for Japanese spoken term detection with optimum index selection based on OOV-region classifier
Graduate School of Informatics, Kyoto University, Sakyo-ku, Kyoto 606-8501, Japan
ICASSP IEEE Int Conf Acoust Speech Signal Process Proc, 2013, (8540-8544):
[24] The Vietnamese Speech Recognition Based on Rectified Linear Units Deep Neural Network and Spoken Term Detection System Combination
Xiong, Shifu
Guo, Wu
Liu, Diyuan
2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 183 - 186
[25] Exploring the Effectiveness of Feature Reduction and Kernel-Based Matching for Query-by- Example Spoken Term Detection Using CNN
Gaonkar, Manisha Naik
Thenkanidiyoor, Veena
Dinesh, Dileep Aroor
Muralikrishna, H.
IEEE ACCESS, 2024, 12 : 194462 - 194474

← 1 2 3 →