Learning a Discriminative Weighted Finite-State Transducer for Speech Recognition

被引：17

作者：

Lehr, Maider ^{[1
]}

Shafran, Izhak ^{[1
]}

机构：

[1] Oregon Hlth & Sci Univ, Ctr Spoken Language Understanding, Portland, OR 97239 USA

来源：

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2011年 / 19卷 / 05期

关键词：

Acoustic modeling; discriminative learning; duration modeling; finite-state transducers; language modeling; learning finite-state transducers;

D O I：

10.1109/TASL.2010.2090518

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Weighted finite-state transducers (WFSTs) have been widely adopted as efficient representations of a general speech recognition model. The WFST for speech recognizer is typically assembled or composed from the several components-the language model, the pronunciation mapping and the acoustic model-which are estimated separately without any end-to-end optimization. This paper examines how the weights of such transducers can be learned in a manner that captures the interaction between the components. The paths in the transducer are represented as n-grams defined over the input and output sequences whose linear weights are learned using a discriminative criterion. The resulting linear model factors into two weighted finite-state acceptors (WFSAs) which can be applied as corrections to the input and the output side of the initial WFST. This formulation allows duration cues to be incorporated seamlessly. Empirical results on a large vocabulary Arabic GALE task demonstrate that the proposed model improves word error rate substantially, with a gain of 1.5%-1.7% absolute. Through a series of experiments, we analyze the contributions from and interactions between acoustic, duration, and language components to find that duration cues play an important role in a large-vocabulary Arabic speech recognition task. Although this paper focuses on speech recognition, the proposed framework for learning the weights of a finite transducer is more general in nature and can be applied to other tasks such as utterance classification.

引用

页码：1360 / 1367

页数：8

共 50 条

[1] Juicer: A weighted finite-state transducer speech decoder
Moore, Darren
Dines, John
Doss, Mathew Magimai
Vepa, Jithendra
Cheng, Octavian
Hain, Thomas
MACHINE LEARNING FOR MULTIMODAL INTERACTION, 2006, 4299 : 285 - +
[2] Weighted finite-state transducers in speech recognition
Mohri, M
Pereira, F
Riley, M
COMPUTER SPEECH AND LANGUAGE, 2002, 16 (01): : 69 - 88
[3] A GENERAL DISCRIMINATIVE TRAINING ALGORITHM FOR SPEECH RECOGNITION USING WEIGHTED FINITE-STATE TRANSDUCERS
Zhao, Yong
Ljolje, Andrej
Caseiro, Diamantino
Juang, Biing-Hwang
2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4217 - 4220
[4] Weighted Finite-State Transducer Approach to German Compound Words Reconstruction for Speech Recognition
Shamraev, Nickolay
Batalshchikov, Alexander
Zulkarneev, Mikhail
Repalov, Sergey
Shirokova, Anna
2015 ARTIFICIAL INTELLIGENCE AND NATURAL LANGUAGE AND INFORMATION EXTRACTION, SOCIAL MEDIA AND WEB SEARCH FRUCT CONFERENCE (AINL-ISMW FRUCT), 2015, : 96 - 101
[5] A study of biasing technical terms in medical speech recognition using weighted finite-state transducer
Kojima, Atsushi
ACOUSTICAL SCIENCE AND TECHNOLOGY, 2022, 43 (01) : 66 - 68
[6] Optimization of Weighted Finite State Transducer for Speech Recognition
Aubert, Louis-Marie
Woods, Roger
Fischaber, Scott
Veitch, Richard
IEEE TRANSACTIONS ON COMPUTERS, 2013, 62 (08) : 1607 - 1615
[7] Hidden semi-Markov model based speech recognition system using weighted finite-state transducer
Oura, Keiichiro
Zen, Heiga
Nankaku, Yoshihiko
Lee, Akinobu
Tokuda, Keiichi
2006 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-13, 2006, : 33 - 36
[8] Weighted finite-state transducer inference for limited-domain speech-to-speech translation
Caseiro, D
Trancoso, I
COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROCEEDINGS, 2006, 3960 : 60 - 68
[9] A Multiplatform Speech Recognition Decoder Based on Weighted Finite-State Transducers
Stoimenov, Emilian
Schultz, Tanja
2009 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION & UNDERSTANDING (ASRU 2009), 2009, : 293 - 298
[10] Fast speech recognition system using weighted finite-state transducers
Guo, Yuhong
Li, Ta
Zhao, Xuemin
Pan, Jielin
Yan, Yonghong
Journal of Information and Computational Science, 2012, 9 (18): : 5807 - 5814

← 1 2 3 4 5 →