Phonetic Spoken Term Detection in Large Audio Archive Using the WFST Framework

被引:0
|
作者
Vavruska, Jan [1 ]
Svec, Jan [1 ]
Ircing, Pavel [1 ]
机构
[1] Univ W Bohemia, Dept Cybernet, Plzen 30614, Czech Republic
来源
TEXT, SPEECH, AND DIALOGUE, TSD 2013 | 2013年 / 8082卷
关键词
spoken term detection; finite state automata;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The paper presents a technique for phonetic spoken term detection in large audio archive. It is designed within the framework of weighted finite-state transducers and utilizes the rather recently developed notion of factor automata, which we have enhanced with a score normalization and a technique for systematic query expansion which allows for phone deletions and substitutions and consequently compensates for frequent pronunciation imperfections and systematic phoneme interchanges occurring during the ASR decoding process. The experiments presented in the paper show that the new WFST-based method outperforms the baseline system both in terms of search performance and speed. Finally, the paper discusses the issues of the proposed techniques that need to be addressed before the application in real-life tasks.
引用
收藏
页码:402 / 409
页数:8
相关论文
共 50 条
  • [41] Score Normalization using Phoneme-based Entropy for Spoken Term Detection
    Nishizaki, Hiromitsu
    Sawada, Naoki
    2015 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2015, : 263 - 269
  • [42] Can You Repeat That? Using Word Repetition to Improve Spoken Term Detection
    Wintrode, Jonathan
    Khudanpur, Sanjeev
    PROCEEDINGS OF THE 52ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, 2014, : 1316 - 1325
  • [43] Addressing the Out-Of-Vocabulary Problem for Large-Scale Chinese Spoken Term Detection
    Meng, Sha
    Shao, Jian
    Yu, Roger Peng
    Liu, Jia
    Seide, Frank
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 2146 - +
  • [44] Spoken Term Detection Results using Plural Subword Models by Estimating Detection Performance for Each Query
    Itoh, Yoshiaki
    Iwata, Kohei
    Ishigame, Masaaki
    Tanaka, Kazuyo
    Lee, Shi-wook
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2128 - 2131
  • [45] Fast Spoken Term Detection Using Pre-retrieval Results of Syllable Bigrams
    Saito, Hiroyuki
    Itoh, Yoshiaki
    Kojima, Kazunori
    Ishigame, Masaaki
    Tanaka, Kazuyo
    Lee, Shi-Wook
    2012 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2012,
  • [46] Improved Mandarin Spoken Term Detection by Using Deep Neural Network for Keyword Verification
    Wang, Xuyang
    Li, Ta
    Xiao, Yeming
    Pan, Jielin
    Yan, Yonghong
    2014 10TH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION (ICNC), 2014, : 144 - 148
  • [47] CALIBRATION AND MULTIPLE SYSTEM FUSION FOR SPOKEN TERM DETECTION USING LINEAR LOGISTIC REGRESSION
    van Hout, J.
    Ferrer, L.
    Vergyri, D.
    Scheffer, N.
    Lei, Y.
    Mitra, V.
    Wegmann, S.
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [48] Enhanced Spoken Term Detection Using Support Vector Machines and Weighted Pseudo Examples
    Lee, Hung-yi
    Lee, Lin-shan
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (06): : 1272 - 1284
  • [49] A Stacking-based Ensemble Framework for Automatic Depression Detection using Audio Signals
    Mamidisetti, Suresh
    Reddy, A. Mallikarjuna
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (07) : 603 - 612
  • [50] AN ITERATIVE DEEP LEARNING FRAMEWORK FOR UNSUPERVISED DISCOVERY OF SPEECH FEATURES AND LINGUISTIC UNITS WITH APPLICATIONS ON SPOKEN TERM DETECTION
    Chung, Cheng-Tao
    Tsai, Cheng-Yu
    Lu, Hsiang-Hung
    Liu, Chia-Hsiang
    Lee, Hung-yi
    Lee, Lin-Shan
    2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 245 - 251