Learning a Discriminative Weighted Finite-State Transducer for Speech Recognition

被引:17
|
作者
Lehr, Maider [1 ]
Shafran, Izhak [1 ]
机构
[1] Oregon Hlth & Sci Univ, Ctr Spoken Language Understanding, Portland, OR 97239 USA
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2011年 / 19卷 / 05期
关键词
Acoustic modeling; discriminative learning; duration modeling; finite-state transducers; language modeling; learning finite-state transducers;
D O I
10.1109/TASL.2010.2090518
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Weighted finite-state transducers (WFSTs) have been widely adopted as efficient representations of a general speech recognition model. The WFST for speech recognizer is typically assembled or composed from the several components-the language model, the pronunciation mapping and the acoustic model-which are estimated separately without any end-to-end optimization. This paper examines how the weights of such transducers can be learned in a manner that captures the interaction between the components. The paths in the transducer are represented as n-grams defined over the input and output sequences whose linear weights are learned using a discriminative criterion. The resulting linear model factors into two weighted finite-state acceptors (WFSAs) which can be applied as corrections to the input and the output side of the initial WFST. This formulation allows duration cues to be incorporated seamlessly. Empirical results on a large vocabulary Arabic GALE task demonstrate that the proposed model improves word error rate substantially, with a gain of 1.5%-1.7% absolute. Through a series of experiments, we analyze the contributions from and interactions between acoustic, duration, and language components to find that duration cues play an important role in a large-vocabulary Arabic speech recognition task. Although this paper focuses on speech recognition, the proposed framework for learning the weights of a finite transducer is more general in nature and can be applied to other tasks such as utterance classification.
引用
收藏
页码:1360 / 1367
页数:8
相关论文
共 50 条
  • [1] Juicer: A weighted finite-state transducer speech decoder
    Moore, Darren
    Dines, John
    Doss, Mathew Magimai
    Vepa, Jithendra
    Cheng, Octavian
    Hain, Thomas
    MACHINE LEARNING FOR MULTIMODAL INTERACTION, 2006, 4299 : 285 - +
  • [2] Weighted finite-state transducers in speech recognition
    Mohri, M
    Pereira, F
    Riley, M
    COMPUTER SPEECH AND LANGUAGE, 2002, 16 (01): : 69 - 88
  • [3] A GENERAL DISCRIMINATIVE TRAINING ALGORITHM FOR SPEECH RECOGNITION USING WEIGHTED FINITE-STATE TRANSDUCERS
    Zhao, Yong
    Ljolje, Andrej
    Caseiro, Diamantino
    Juang, Biing-Hwang
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4217 - 4220
  • [4] Weighted Finite-State Transducer Approach to German Compound Words Reconstruction for Speech Recognition
    Shamraev, Nickolay
    Batalshchikov, Alexander
    Zulkarneev, Mikhail
    Repalov, Sergey
    Shirokova, Anna
    2015 ARTIFICIAL INTELLIGENCE AND NATURAL LANGUAGE AND INFORMATION EXTRACTION, SOCIAL MEDIA AND WEB SEARCH FRUCT CONFERENCE (AINL-ISMW FRUCT), 2015, : 96 - 101
  • [5] A study of biasing technical terms in medical speech recognition using weighted finite-state transducer
    Kojima, Atsushi
    ACOUSTICAL SCIENCE AND TECHNOLOGY, 2022, 43 (01) : 66 - 68
  • [6] Optimization of Weighted Finite State Transducer for Speech Recognition
    Aubert, Louis-Marie
    Woods, Roger
    Fischaber, Scott
    Veitch, Richard
    IEEE TRANSACTIONS ON COMPUTERS, 2013, 62 (08) : 1607 - 1615
  • [7] Hidden semi-Markov model based speech recognition system using weighted finite-state transducer
    Oura, Keiichiro
    Zen, Heiga
    Nankaku, Yoshihiko
    Lee, Akinobu
    Tokuda, Keiichi
    2006 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-13, 2006, : 33 - 36
  • [8] Weighted finite-state transducer inference for limited-domain speech-to-speech translation
    Caseiro, D
    Trancoso, I
    COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROCEEDINGS, 2006, 3960 : 60 - 68
  • [9] A Multiplatform Speech Recognition Decoder Based on Weighted Finite-State Transducers
    Stoimenov, Emilian
    Schultz, Tanja
    2009 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION & UNDERSTANDING (ASRU 2009), 2009, : 293 - 298
  • [10] Fast speech recognition system using weighted finite-state transducers
    Guo, Yuhong
    Li, Ta
    Zhao, Xuemin
    Pan, Jielin
    Yan, Yonghong
    Journal of Information and Computational Science, 2012, 9 (18): : 5807 - 5814