Learning a Discriminative Weighted Finite-State Transducer for Speech Recognition

被引:17
|
作者
Lehr, Maider [1 ]
Shafran, Izhak [1 ]
机构
[1] Oregon Hlth & Sci Univ, Ctr Spoken Language Understanding, Portland, OR 97239 USA
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2011年 / 19卷 / 05期
关键词
Acoustic modeling; discriminative learning; duration modeling; finite-state transducers; language modeling; learning finite-state transducers;
D O I
10.1109/TASL.2010.2090518
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Weighted finite-state transducers (WFSTs) have been widely adopted as efficient representations of a general speech recognition model. The WFST for speech recognizer is typically assembled or composed from the several components-the language model, the pronunciation mapping and the acoustic model-which are estimated separately without any end-to-end optimization. This paper examines how the weights of such transducers can be learned in a manner that captures the interaction between the components. The paths in the transducer are represented as n-grams defined over the input and output sequences whose linear weights are learned using a discriminative criterion. The resulting linear model factors into two weighted finite-state acceptors (WFSAs) which can be applied as corrections to the input and the output side of the initial WFST. This formulation allows duration cues to be incorporated seamlessly. Empirical results on a large vocabulary Arabic GALE task demonstrate that the proposed model improves word error rate substantially, with a gain of 1.5%-1.7% absolute. Through a series of experiments, we analyze the contributions from and interactions between acoustic, duration, and language components to find that duration cues play an important role in a large-vocabulary Arabic speech recognition task. Although this paper focuses on speech recognition, the proposed framework for learning the weights of a finite transducer is more general in nature and can be applied to other tasks such as utterance classification.
引用
收藏
页码:1360 / 1367
页数:8
相关论文
共 50 条
  • [31] Incremental language models for speech recognition using finite-state transducers
    Dolfing, HJGA
    Hetherington, LL
    ASRU 2001: IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, CONFERENCE PROCEEDINGS, 2001, : 194 - 197
  • [32] Finite-state models, event logics and statistics in speech recognition - Discussion
    Young, SJ
    Carson-Berndsen, J
    Kazakov, D
    Alshawi, H
    Pereira, F
    PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY OF LONDON SERIES A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES, 2000, 358 (1769): : 1266 - 1266
  • [33] Part-of-Speech Tagging Using Parallel Weighted Finite-State Transducers
    Silfverberg, Miikka
    Linden, Krister
    ADVANCES IN NATURAL LANGUAGE PROCESSING, 2010, 6233 : 369 - 380
  • [34] Finite-state transducer for Amazigh verbal morphology
    Ataa Allah, Fadoua
    DIGITAL SCHOLARSHIP IN THE HUMANITIES, 2016, 31 (01) : 21 - 29
  • [35] Klex: A finite-state transducer lexicon of Korean
    Han, Na-Rae
    Finite-State Methods and Natural Language Processing, 2006, 4002 : 67 - 77
  • [36] A Weighted Finite-State Transducer Implementation of Phoneme Rewrite Rules for English to Korean Pronunciation Conversion
    Koo, Hahn
    COMPUTATIONAL LINGUISTICS AND RELATED FIELDS, 2011, 27 : 202 - 208
  • [37] AN FPGA IMPLEMENTATION OF SPEECH RECOGNITION WITH WEIGHTED FINITE STATE TRANSDUCERS
    Choi, Jungwook
    You, Kisun
    Sung, Wonyong
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 1602 - 1605
  • [38] An FPGA implementation of speech recognition with weighted finite state transducers
    School of Electrical Engineering, Seoul National University, San 56-1, Shillim-dong, Kwanak-gu, Seoul 151-744, Korea, Republic of
    ICASSP IEEE Int Conf Acoust Speech Signal Process Proc, 2010, (1602-1605):
  • [39] An Expanded Finite-State Transducer for Tsuut'ina Verbs
    Holden, Joshua
    Cox, Christopher
    Arppe, Antti
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 5143 - 5152
  • [40] Pronunciation modeling using a finite-state transducer representation
    Hazen, TJ
    Hetherington, IL
    Shu, H
    Livescu, K
    SPEECH COMMUNICATION, 2005, 46 (02) : 189 - 203