Composite decision by Bayesian inference in distant-talking speech recognition

被引：0

作者：

Ji, Mikyong ^{[1
]}

Kim, Sungtak ^{[1
]}

Kim, Hoirin ^{[1
]}

机构：

[1] Informat & Commun Univ, SRT Lab, Taejon 305732, South Korea

来源：

TEXT, SPEECH AND DIALOGUE, PROCEEDINGS | 2006年 / 4188卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper describes an integrated system to produce a composite recognition output on distant-talking speech when the recognition results from multiple microphone inputs are available. In many cases, the composite recognition result has lower error rate than any other individual output. In this work, the composite recognition result is obtained by applying Bayesian inference. The log likelihood score is assumed. to follow a Gaussian distribution, at least approximately. First, the distribution of the likelihood score is estimated in the development set. Then, the confidence interval for the likelihood score is used to remove unreliable microphone channels. Finally, the area under the distribution between the likelihood score of a hypothesis and that of the (N+1)(st) hypothesis is obtained for every channel and integrated for all channels by Bayesian inference. The proposed system shows considerable performance improvement compared with the result using an ordinary method by the summation of likelihoods as well as any of the recognition results of the channels.

引用

页码：463 / 470

页数：8

共 50 条

[31] Dereverberantion based on Generalized Spectral Subtraction for Distant-talking Speaker Recognition
Zhang, Zhaofeng
Wang, Longbiao
Kai, Atsuhiko
2012 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2012,
[32] 3-D N-best search for simultaneous recognition of distant-talking speech of multiple talkers
Nakamura, S
Heracleous, P
FOURTH IEEE INTERNATIONAL CONFERENCE ON MULTIMODAL INTERFACES, PROCEEDINGS, 2002, : 59 - 63
[33] Single-channel dereverberation for distant-talking speech recognition by combining denoising autoencoder and temporal structure normalization
Ueda, Yuma
Wang, Longbiao
Kai, Atsuhiko
Xiao, Xiong
Chng, Eng Siong
Li, Haizhou
2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 379 - +
[34] Distant-talking speech recognition using multi-channel LMS and multiple-step linear prediction
Shiota, Satoshi
Wang, Longbiao
Odani, Kyohei
Kai, Atsuhiko
Li, Weifeng
2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 384 - +
[35] Minimum Kullback-Leibler distance based multivariate Gaussian feature adaptation for distant-talking speech recognition
Pan, Y
Waibel, A
2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 1029 - 1032
[36] Single-channel Dereverberation for Distant-Talking Speech Recognition by Combining Denoising Autoencoder and Temporal Structure Normalization
Yuma Ueda
Longbiao Wang
Atsuhiko Kai
Xiong Xiao
Eng Siong Chng
Haizhou Li
Journal of Signal Processing Systems, 2016, 82 : 151 - 161
[37] Single-channel Dereverberation for Distant-Talking Speech Recognition by Combining Denoising Autoencoder and Temporal Structure Normalization
Ueda, Yuma
Wang, Longbiao
Kai, Atsuhiko
Xiao, Xiong
Chng, Eng Siong
Li, Haizhou
JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2016, 82 (02): : 151 - 161
[38] Distant-talking Speech Recognition Based on Multi-objective Learning using Phase and Magnitude-based Feature
Li, Dongbo
Wang, Longbiao
Dang, Jianwu
Ge, Meng
Guan, Haotian
2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2018, : 394 - 398
[39] Deep learning based distant-talking speech processing in real-world sound environments
Araki, Shoko
Fujimoto, Masakiyo
Yoshioka, Takuya
Delcroix, Marc
Espi, Miquel
Nakatani, Tomohiro
NTT Technical Review, 2015, 13 (11):
[40] EXPERIMENTS ON DISTANT-TALKING SPEAKER VERIFICATION IN TV SCENARIO
Zieger, Christian
Matassoni, Marco
Omologo, Maurizio
2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4538 - 4541

← 1 2 3 4 5 →