Composite decision by Bayesian inference in distant-talking speech recognition

被引：0

作者：

Ji, Mikyong ^{[1
]}

Kim, Sungtak ^{[1
]}

Kim, Hoirin ^{[1
]}

机构：

[1] Informat & Commun Univ, SRT Lab, Taejon 305732, South Korea

来源：

TEXT, SPEECH AND DIALOGUE, PROCEEDINGS | 2006年 / 4188卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper describes an integrated system to produce a composite recognition output on distant-talking speech when the recognition results from multiple microphone inputs are available. In many cases, the composite recognition result has lower error rate than any other individual output. In this work, the composite recognition result is obtained by applying Bayesian inference. The log likelihood score is assumed. to follow a Gaussian distribution, at least approximately. First, the distribution of the likelihood score is estimated in the development set. Then, the confidence interval for the likelihood score is used to remove unreliable microphone channels. Finally, the area under the distribution between the likelihood score of a hypothesis and that of the (N+1)(st) hypothesis is obtained for every channel and integrated for all channels by Bayesian inference. The proposed system shows considerable performance improvement compared with the result using an ordinary method by the summation of likelihoods as well as any of the recognition results of the channels.

引用

页码：463 / 470

页数：8

共 50 条

[41] Simultaneous recognition of distant-talking speech of multiple talkers based on the 3-D N-best search method
Heracleous, P
Nakamura, S
Shikano, K
JOURNAL OF VLSI SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2004, 36 (2-3): : 105 - 116
[42] Simultaneous Recognition of Distant-Talking Speech of Multiple Talkers Based on the 3-D N-Best Search Method
Panikos Heracleous
Satoshi Nakamura
Kiyohiro Shikano
Journal of VLSI signal processing systems for signal, image and video technology, 2004, 36 : 105 - 116
[43] A prototype of distant-talking interface for control of interactive TV
Omologo, Maurizio
2010 CONFERENCE RECORD OF THE FORTY FOURTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS (ASILOMAR), 2010, : 1711 - 1715
[44] A reverberation robust target speech detection method using dual-microphone in distant-talking scene
Wang, Xiaofei
Guo, Yanmeng
Wu, Chao
Fu, Qiang
Yan, Yonghong
SPEECH COMMUNICATION, 2015, 72 : 47 - 58
[45] Simultaneous recognition of distant-talking speech of multiple sound sources based on 3-D N-best search algorithm
Heracleous, P
Nakamura, S
Shikano, K
ASRU 2001: IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, CONFERENCE PROCEEDINGS, 2001, : 111 - 114
[46] Effective Acoustic Adaptation for A Distant-talking Interactive TV System
Huang, Jing
Epstein, Mark
Matassoni, Marco
INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1709 - +
[47] Group Delay Based Methods for Recognition of Distant talking Speech
Mandala, Rohan
Shukla, Mrityunjaya
Hegde, Rajesh
2010 CONFERENCE RECORD OF THE FORTY FOURTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS (ASILOMAR), 2010, : 1702 - 1706
[48] Prediction, Bayesian inference and feedback in speech recognition
Norris, Dennis
McQueen, James M.
Cutler, Anne
LANGUAGE COGNITION AND NEUROSCIENCE, 2016, 31 (01) : 4 - 18
[49] Using artificially reverberated training data in distant-talking ASR
Haderlein, T
Nöth, E
Herbordt, W
Kellermann, W
Niemann, H
TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2005, 3658 : 226 - 233
[50] A TWO-MICROPHONE BASED VOICE ACTIVITY DETECTION FOR DISTANT-TALKING SPEECH IN WIDE RANGE OF DIRECTION OF ARRIVAL
Guo, Yanmeng
Li, Kai
Fu, Qiang
Yan, Yonghong
2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4901 - 4904

← 1 2 3 4 5 →