Zero-resource audio-only spoken term detection based on a combination of template matching techniques

被引:0
|
作者
Muscariello, Armando [1 ]
Gravier, Guillaume [1 ]
Bimbot, Frederic [1 ]
机构
[1] IRISA CNRS UMR 6074, Paris, France
来源
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5 | 2011年
关键词
spoken term detection; template matching; unsupervised learning; posterior features;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Spoken term detection is a well-known information retrieval task that seeks to extract contentful information from audio by locating occurrences of known query words of interest. This paper describes a zero-resource approach to such task based on pattern matching of spoken term queries at the acoustic level. The template matching module comprises the cascade of a segmental variant of dynamic time warping and a self-similarity matrix comparison to further improve robustness to speech variability. This solution notably differs from more traditional train and test methods that, while shown to be very accurate, rely upon the availability of large amounts of linguistic resources. We evaluate our framework on different parameterizations of the speech templates: raw MFCC features and Gaussian posteriorgrams, French and English phonetic posteriorgrams output by two different state of the art phoneme recognizers.
引用
收藏
页码:928 / 931
页数:4
相关论文
共 25 条
  • [21] COMBINATION OF SYLLABLE BASED N-GRAM SEARCH AND WORD SEARCH FOR SPOKEN TERM DETECTION THROUGH SPOKEN QUERIES AND IV/OOV CLASSIFICATION
    Sakamoto, Nagisa
    Yamamoto, Kazumasa
    Nakagawa, Seiichi
    2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 200 - 206
  • [22] MULTIPLE INDEX COMBINATION FOR JAPANESE SPOKEN TERM DETECTION WITH OPTIMUM INDEX SELECTION BASED ON OOV-REGION CLASSIFIER
    Kanda, Naoyuki
    Itoyama, Katsutoshi
    Okuno, Hiroshi G.
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 8540 - 8544
  • [23] Multiple index combination for Japanese spoken term detection with optimum index selection based on OOV-region classifier
    Graduate School of Informatics, Kyoto University, Sakyo-ku, Kyoto 606-8501, Japan
    ICASSP IEEE Int Conf Acoust Speech Signal Process Proc, 2013, (8540-8544):
  • [24] The Vietnamese Speech Recognition Based on Rectified Linear Units Deep Neural Network and Spoken Term Detection System Combination
    Xiong, Shifu
    Guo, Wu
    Liu, Diyuan
    2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 183 - 186
  • [25] Exploring the Effectiveness of Feature Reduction and Kernel-Based Matching for Query-by- Example Spoken Term Detection Using CNN
    Gaonkar, Manisha Naik
    Thenkanidiyoor, Veena
    Dinesh, Dileep Aroor
    Muralikrishna, H.
    IEEE ACCESS, 2024, 12 : 194462 - 194474