Distant-talking speech recognition based on a 3-D Viterbi search using a microphone array

被引:10
|
作者
Yamada, T [1 ]
Nakamura, S [1 ]
Shikano, K [1 ]
机构
[1] Univ Tsukuba, Inst Informat Sci & Elect, Tsukuba, Ibaraki 3058573, Japan
来源
关键词
distant-talking situations; microphone arrays; real environments; speech recognition; talker localization;
D O I
10.1109/89.985542
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper focuses on microphone arrays to realize distant-talking speech recognition in real environments. In distant-talking situations, users can speak at arbitrary positions while moving. Therefore, it is very important for high quality speech acquisition using microphone arrays to localize a talker accurately. However, it is very difficult to localize a moving talker in noisy and reverberant environments. The talker localization errors result in performance degradation of speech recognition. One way to solve this problem is to integrate the speech recognition process and the talker localization into a unified framework. This paper proposes a new speech recognition algorithm based on a three-dimensional (3-D) Viterbi search. The 3-D Viterbi method extracts a direction-time sequence of parameter vectors by steering a beam to every direction in every frame, then rinds the most likely path in a 3-D trellis space composed of talker directions, input frames and HMM states. This means that speech recognition and talker localization are performed simultaneously within a statistical framework. To evaluate the performance of the 3-D Viterbi method, recognition experiments for real environment data were carried out. The results confirmed that the 3-D Viterbi method drastically improves the recognition performance for the moving talker case as well as for the fixed-position talker case.
引用
收藏
页码:48 / 56
页数:9
相关论文
共 50 条
  • [41] Microphone array signal processing for far-talking speech recognition
    Chien, JT
    Lai, JR
    Lai, PY
    2001 IEEE THIRD WORKSHOP ON SIGNAL PROCESSING ADVANCES IN WIRELESS COMMUNICATIONS, PROCEEDINGS, 2001, : 322 - 325
  • [42] DISTANT SPEECH RECOGNITION IN REVERBERANT NOISY CONDITIONS EMPLOYING A MICROPHONE ARRAY
    Morales-Cordovilla, Juan A.
    Hagmueller, Martin
    Pessentheiner, Hannes
    Kubin, Gernot
    2014 PROCEEDINGS OF THE 22ND EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2014, : 2380 - 2384
  • [43] Evaluation Framework for Distant-talking Speech Recognition under Reverberant Environments - Newest Part of the CENSREC Series -
    Nishiura, Takanobu
    Nakayama, Masato
    Denda, Yuki
    Kitaoka, Norihide
    Yamamoto, Kazumasa
    Yamada, Takeshi
    Tsuge, Satoru
    Miyajima, Chiyomi
    Fujimoto, Masakiyo
    Takiguchi, Tetsuya
    Tamura, Satoshi
    Kuroiwa, Shingo
    Takeda, Kazuya
    Nakamura, Satoshi
    SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 1828 - 1834
  • [44] CENSREC-4: Development of Evaluation Framework for Distant-talking Speech Recognition under Reverberant Environments
    Nakayama, Masato
    Nishiura, Takanobu
    Denda, Yuki
    Kitaoka, Norihide
    Yamamoto, Kazumasa
    Yamada, Takeshi
    Tsuge, Satoru
    Miyajima, Chiyomi
    Fujimoto, Masakiyo
    Takiguchi, Tetsuya
    Tamura, Satoshi
    Ogawa, Tetsuji
    Matsuda, Shigeki
    Kuroiwa, Shingo
    Takeda, Kazuya
    Nakamura, Satoshi
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 968 - +
  • [45] Deep learning based distant-talking speech processing in real-world sound environments
    Araki, Shoko
    Fujimoto, Masakiyo
    Yoshioka, Takuya
    Delcroix, Marc
    Espi, Miquel
    Nakatani, Tomohiro
    NTT Technical Review, 2015, 13 (11):
  • [46] Speech Enhancement Using Compact Microphone Array and Applications in Distant Speech Acquisition
    Zhang Heng
    Fu Qiang
    Yan Yonghong
    CHINESE JOURNAL OF ELECTRONICS, 2009, 18 (03): : 481 - 486
  • [47] Weighted autocorrelation-based F0 estimation for distant-talking interaction with a distributed microphone network
    Armani, L
    Omologo, M
    2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 113 - 116
  • [48] Single-channel dereverberation for distant-talking speech recognition by combining denoising autoencoder and temporal structure normalization
    Ueda, Yuma
    Wang, Longbiao
    Kai, Atsuhiko
    Xiao, Xiong
    Chng, Eng Siong
    Li, Haizhou
    2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 379 - +
  • [49] A Posterior Approach for Microphone Array Based Speech Recognition
    Wang, Dong
    Himawan, Ivan
    Frankel, Joe
    King, Simon
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 996 - 999
  • [50] Single-channel Dereverberation for Distant-Talking Speech Recognition by Combining Denoising Autoencoder and Temporal Structure Normalization
    Yuma Ueda
    Longbiao Wang
    Atsuhiko Kai
    Xiong Xiao
    Eng Siong Chng
    Haizhou Li
    Journal of Signal Processing Systems, 2016, 82 : 151 - 161