Distant-talking speech recognition based on a 3-D Viterbi search using a microphone array

被引:10
|
作者
Yamada, T [1 ]
Nakamura, S [1 ]
Shikano, K [1 ]
机构
[1] Univ Tsukuba, Inst Informat Sci & Elect, Tsukuba, Ibaraki 3058573, Japan
来源
关键词
distant-talking situations; microphone arrays; real environments; speech recognition; talker localization;
D O I
10.1109/89.985542
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper focuses on microphone arrays to realize distant-talking speech recognition in real environments. In distant-talking situations, users can speak at arbitrary positions while moving. Therefore, it is very important for high quality speech acquisition using microphone arrays to localize a talker accurately. However, it is very difficult to localize a moving talker in noisy and reverberant environments. The talker localization errors result in performance degradation of speech recognition. One way to solve this problem is to integrate the speech recognition process and the talker localization into a unified framework. This paper proposes a new speech recognition algorithm based on a three-dimensional (3-D) Viterbi search. The 3-D Viterbi method extracts a direction-time sequence of parameter vectors by steering a beam to every direction in every frame, then rinds the most likely path in a 3-D trellis space composed of talker directions, input frames and HMM states. This means that speech recognition and talker localization are performed simultaneously within a statistical framework. To evaluate the performance of the 3-D Viterbi method, recognition experiments for real environment data were carried out. The results confirmed that the 3-D Viterbi method drastically improves the recognition performance for the moving talker case as well as for the fixed-position talker case.
引用
收藏
页码:48 / 56
页数:9
相关论文
共 50 条
  • [1] Hands-free speech recognition based on 3-D viterbi search using a microphone array
    Yamada, T
    Nakamura, S
    Shikano, K
    PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 245 - 248
  • [2] 3-D N-best search for simultaneous recognition of distant-talking speech of multiple talkers
    Nakamura, S
    Heracleous, P
    FOURTH IEEE INTERNATIONAL CONFERENCE ON MULTIMODAL INTERFACES, PROCEEDINGS, 2002, : 59 - 63
  • [3] Simultaneous recognition of distant-talking speech of multiple talkers based on the 3-D N-best search method
    Heracleous, P
    Nakamura, S
    Shikano, K
    JOURNAL OF VLSI SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2004, 36 (2-3): : 105 - 116
  • [4] Simultaneous Recognition of Distant-Talking Speech of Multiple Talkers Based on the 3-D N-Best Search Method
    Panikos Heracleous
    Satoshi Nakamura
    Kiyohiro Shikano
    Journal of VLSI signal processing systems for signal, image and video technology, 2004, 36 : 105 - 116
  • [5] Simultaneous recognition of distant-talking speech of multiple sound sources based on 3-D N-best search algorithm
    Heracleous, P
    Nakamura, S
    Shikano, K
    ASRU 2001: IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, CONFERENCE PROCEEDINGS, 2001, : 111 - 114
  • [6] Robust distant-talking speech recognition
    Lin, Q
    Che, C
    Yuk, DS
    Jin, L
    deVries, B
    Pearson, J
    Flanagan, J
    1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 21 - 24
  • [7] Distant-talking speech recognition with microphone-array sound pickup and NN/MLLR environment equalization
    Lin, QG
    Flanagan, J
    Che, CW
    PROGRESS IN CONNECTIONIST-BASED INFORMATION SYSTEMS, VOLS 1 AND 2, 1998, : 1099 - 1102
  • [8] Improved HMM separation for distant-talking speech recognition
    Takiguchi, T
    Nishimura, M
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2004, E87D (05): : 1127 - 1137
  • [9] Distant Speech Recognition Using a Microphone Array Network
    Nakano, Alberto Yoshihiro
    Nakagawa, Seiichi
    Yamamoto, Kazumasa
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2010, E93D (09): : 2451 - 2462
  • [10] Experiments on distant-talking speech recognition in meeting room using extended MAM
    Pan, Y
    Waibel, A
    2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 4165 - 4165