Composite decision by Bayesian inference in distant-talking speech recognition

被引:0
|
作者
Ji, Mikyong [1 ]
Kim, Sungtak [1 ]
Kim, Hoirin [1 ]
机构
[1] Informat & Commun Univ, SRT Lab, Taejon 305732, South Korea
来源
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper describes an integrated system to produce a composite recognition output on distant-talking speech when the recognition results from multiple microphone inputs are available. In many cases, the composite recognition result has lower error rate than any other individual output. In this work, the composite recognition result is obtained by applying Bayesian inference. The log likelihood score is assumed. to follow a Gaussian distribution, at least approximately. First, the distribution of the likelihood score is estimated in the development set. Then, the confidence interval for the likelihood score is used to remove unreliable microphone channels. Finally, the area under the distribution between the likelihood score of a hypothesis and that of the (N+1)(st) hypothesis is obtained for every channel and integrated for all channels by Bayesian inference. The proposed system shows considerable performance improvement compared with the result using an ordinary method by the summation of likelihoods as well as any of the recognition results of the channels.
引用
收藏
页码:463 / 470
页数:8
相关论文
共 50 条
  • [31] Dereverberantion based on Generalized Spectral Subtraction for Distant-talking Speaker Recognition
    Zhang, Zhaofeng
    Wang, Longbiao
    Kai, Atsuhiko
    2012 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2012,
  • [32] 3-D N-best search for simultaneous recognition of distant-talking speech of multiple talkers
    Nakamura, S
    Heracleous, P
    FOURTH IEEE INTERNATIONAL CONFERENCE ON MULTIMODAL INTERFACES, PROCEEDINGS, 2002, : 59 - 63
  • [33] Single-channel dereverberation for distant-talking speech recognition by combining denoising autoencoder and temporal structure normalization
    Ueda, Yuma
    Wang, Longbiao
    Kai, Atsuhiko
    Xiao, Xiong
    Chng, Eng Siong
    Li, Haizhou
    2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 379 - +
  • [34] Distant-talking speech recognition using multi-channel LMS and multiple-step linear prediction
    Shiota, Satoshi
    Wang, Longbiao
    Odani, Kyohei
    Kai, Atsuhiko
    Li, Weifeng
    2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 384 - +
  • [35] Minimum Kullback-Leibler distance based multivariate Gaussian feature adaptation for distant-talking speech recognition
    Pan, Y
    Waibel, A
    2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 1029 - 1032
  • [36] Single-channel Dereverberation for Distant-Talking Speech Recognition by Combining Denoising Autoencoder and Temporal Structure Normalization
    Yuma Ueda
    Longbiao Wang
    Atsuhiko Kai
    Xiong Xiao
    Eng Siong Chng
    Haizhou Li
    Journal of Signal Processing Systems, 2016, 82 : 151 - 161
  • [37] Single-channel Dereverberation for Distant-Talking Speech Recognition by Combining Denoising Autoencoder and Temporal Structure Normalization
    Ueda, Yuma
    Wang, Longbiao
    Kai, Atsuhiko
    Xiao, Xiong
    Chng, Eng Siong
    Li, Haizhou
    JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2016, 82 (02): : 151 - 161
  • [38] Distant-talking Speech Recognition Based on Multi-objective Learning using Phase and Magnitude-based Feature
    Li, Dongbo
    Wang, Longbiao
    Dang, Jianwu
    Ge, Meng
    Guan, Haotian
    2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2018, : 394 - 398
  • [39] Deep learning based distant-talking speech processing in real-world sound environments
    Araki, Shoko
    Fujimoto, Masakiyo
    Yoshioka, Takuya
    Delcroix, Marc
    Espi, Miquel
    Nakatani, Tomohiro
    NTT Technical Review, 2015, 13 (11):
  • [40] EXPERIMENTS ON DISTANT-TALKING SPEAKER VERIFICATION IN TV SCENARIO
    Zieger, Christian
    Matassoni, Marco
    Omologo, Maurizio
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4538 - 4541