Distant-talking speech recognition based on a 3-D Viterbi search using a microphone array

被引:10
|
作者
Yamada, T [1 ]
Nakamura, S [1 ]
Shikano, K [1 ]
机构
[1] Univ Tsukuba, Inst Informat Sci & Elect, Tsukuba, Ibaraki 3058573, Japan
来源
关键词
distant-talking situations; microphone arrays; real environments; speech recognition; talker localization;
D O I
10.1109/89.985542
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper focuses on microphone arrays to realize distant-talking speech recognition in real environments. In distant-talking situations, users can speak at arbitrary positions while moving. Therefore, it is very important for high quality speech acquisition using microphone arrays to localize a talker accurately. However, it is very difficult to localize a moving talker in noisy and reverberant environments. The talker localization errors result in performance degradation of speech recognition. One way to solve this problem is to integrate the speech recognition process and the talker localization into a unified framework. This paper proposes a new speech recognition algorithm based on a three-dimensional (3-D) Viterbi search. The 3-D Viterbi method extracts a direction-time sequence of parameter vectors by steering a beam to every direction in every frame, then rinds the most likely path in a 3-D trellis space composed of talker directions, input frames and HMM states. This means that speech recognition and talker localization are performed simultaneously within a statistical framework. To evaluate the performance of the 3-D Viterbi method, recognition experiments for real environment data were carried out. The results confirmed that the 3-D Viterbi method drastically improves the recognition performance for the moving talker case as well as for the fixed-position talker case.
引用
收藏
页码:48 / 56
页数:9
相关论文
共 50 条
  • [31] JOINT SPARSE REPRESENTATION BASED CEPSTRAL-DOMAIN DEREVERBERATION FOR DISTANT-TALKING SPEECH RECOGNITION
    Li, Weifeng
    Wang, Longbiao
    Zhou, Fei
    Liao, Qingmin
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7117 - 7120
  • [32] CENSREC-4: An evaluation framework for distant-talking speech recognition in reverberant environments
    Fukumori, Takahiro
    Nishiura, Takanobu
    Nakayama, Masato
    Denda, Yuki
    Kitaoka, Norihide
    Yamada, Takeshi
    Yamamoto, Kazumasa
    Tsuge, Satoru
    Fujimoto, Masakiyo
    Takiguchi, Tetsuya
    Miyajima, Chiyomi
    Tamura, Satoshi
    Ogawa, Tetsuji
    Matsuda, Shigeki
    Kuroiwa, Shingo
    Takeda, Kazuya
    Nakamura, Satoshi
    ACOUSTICAL SCIENCE AND TECHNOLOGY, 2011, 32 (05) : 201 - 210
  • [33] Microphone Array Processing Strategies for Distant-Based Automatic Speech Recognition
    Khoubrouy, Soudeh A.
    Hansen, John H. L.
    IEEE SIGNAL PROCESSING LETTERS, 2016, 23 (10) : 1344 - 1348
  • [34] Dereverberantion based on Generalized Spectral Subtraction for Distant-talking Speaker Recognition
    Zhang, Zhaofeng
    Wang, Longbiao
    Kai, Atsuhiko
    2012 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2012,
  • [35] Microphone Array Processing for Distant Speech Recognition: Spherical Arrays
    McDonough, John
    Kumatani, Kenichi
    Raj, Bhiksha
    2012 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2012,
  • [36] HMM adaptation and microphone array processing for distant speech recognition
    Kleban, J
    Gong, YF
    2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 1411 - 1414
  • [37] Multi-party Human-Robot Interaction with Distant-Talking Speech Recognition
    Gomez, Randy
    Kawahara, Tatsuya
    Nakamura, Keisuke
    Nakadai, Kazuhiro
    HRI'12: PROCEEDINGS OF THE SEVENTH ANNUAL ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION, 2012, : 439 - 446
  • [38] Investigations into Early and Late Reflections on Distant-Talking Speech Recognition Toward Suitable Reverberation Criteria
    Nishiura, Takanobu
    Hirano, Yoshiki
    Denda, Yuki
    Nakayama, Masato
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1369 - 1372
  • [39] Distant-talking speech recognition using multi-channel LMS and multiple-step linear prediction
    Shiota, Satoshi
    Wang, Longbiao
    Odani, Kyohei
    Kai, Atsuhiko
    Li, Weifeng
    2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 384 - +
  • [40] Minimum Kullback-Leibler distance based multivariate Gaussian feature adaptation for distant-talking speech recognition
    Pan, Y
    Waibel, A
    2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 1029 - 1032