Learning a Discriminative Weighted Finite-State Transducer for Speech Recognition

被引：17

作者：

Lehr, Maider ^{[1
]}

Shafran, Izhak ^{[1
]}

机构：

[1] Oregon Hlth & Sci Univ, Ctr Spoken Language Understanding, Portland, OR 97239 USA

来源：

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2011年 / 19卷 / 05期

关键词：

Acoustic modeling; discriminative learning; duration modeling; finite-state transducers; language modeling; learning finite-state transducers;

D O I：

10.1109/TASL.2010.2090518

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Weighted finite-state transducers (WFSTs) have been widely adopted as efficient representations of a general speech recognition model. The WFST for speech recognizer is typically assembled or composed from the several components-the language model, the pronunciation mapping and the acoustic model-which are estimated separately without any end-to-end optimization. This paper examines how the weights of such transducers can be learned in a manner that captures the interaction between the components. The paths in the transducer are represented as n-grams defined over the input and output sequences whose linear weights are learned using a discriminative criterion. The resulting linear model factors into two weighted finite-state acceptors (WFSAs) which can be applied as corrections to the input and the output side of the initial WFST. This formulation allows duration cues to be incorporated seamlessly. Empirical results on a large vocabulary Arabic GALE task demonstrate that the proposed model improves word error rate substantially, with a gain of 1.5%-1.7% absolute. Through a series of experiments, we analyze the contributions from and interactions between acoustic, duration, and language components to find that duration cues play an important role in a large-vocabulary Arabic speech recognition task. Although this paper focuses on speech recognition, the proposed framework for learning the weights of a finite transducer is more general in nature and can be applied to other tasks such as utterance classification.

引用

页码：1360 / 1367

页数：8

共 50 条

[31] Incremental language models for speech recognition using finite-state transducers
Dolfing, HJGA
Hetherington, LL
ASRU 2001: IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, CONFERENCE PROCEEDINGS, 2001, : 194 - 197
[32] Finite-state models, event logics and statistics in speech recognition - Discussion
Young, SJ
Carson-Berndsen, J
Kazakov, D
Alshawi, H
Pereira, F
PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY OF LONDON SERIES A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES, 2000, 358 (1769): : 1266 - 1266
[33] Part-of-Speech Tagging Using Parallel Weighted Finite-State Transducers
Silfverberg, Miikka
Linden, Krister
ADVANCES IN NATURAL LANGUAGE PROCESSING, 2010, 6233 : 369 - 380
[34] Finite-state transducer for Amazigh verbal morphology
Ataa Allah, Fadoua
DIGITAL SCHOLARSHIP IN THE HUMANITIES, 2016, 31 (01) : 21 - 29
[35] Klex: A finite-state transducer lexicon of Korean
Han, Na-Rae
Finite-State Methods and Natural Language Processing, 2006, 4002 : 67 - 77
[36] A Weighted Finite-State Transducer Implementation of Phoneme Rewrite Rules for English to Korean Pronunciation Conversion
Koo, Hahn
COMPUTATIONAL LINGUISTICS AND RELATED FIELDS, 2011, 27 : 202 - 208
[37] AN FPGA IMPLEMENTATION OF SPEECH RECOGNITION WITH WEIGHTED FINITE STATE TRANSDUCERS
Choi, Jungwook
You, Kisun
Sung, Wonyong
2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 1602 - 1605
[38] An FPGA implementation of speech recognition with weighted finite state transducers
School of Electrical Engineering, Seoul National University, San 56-1, Shillim-dong, Kwanak-gu, Seoul 151-744, Korea, Republic of
ICASSP IEEE Int Conf Acoust Speech Signal Process Proc, 2010, (1602-1605):
[39] An Expanded Finite-State Transducer for Tsuut'ina Verbs
Holden, Joshua
Cox, Christopher
Arppe, Antti
LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 5143 - 5152
[40] Pronunciation modeling using a finite-state transducer representation
Hazen, TJ
Hetherington, IL
Shu, H
Livescu, K
SPEECH COMMUNICATION, 2005, 46 (02) : 189 - 203

← 1 2 3 4 5 →