Distant-talking speech recognition based on a 3-D Viterbi search using a microphone array

被引：10

作者：

Yamada, T ^{[1
]}

Nakamura, S ^{[1
]}

Shikano, K ^{[1
]}

机构：

[1] Univ Tsukuba, Inst Informat Sci & Elect, Tsukuba, Ibaraki 3058573, Japan

来源：

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING | 2002年 / 10卷 / 02期

关键词：

distant-talking situations; microphone arrays; real environments; speech recognition; talker localization;

D O I：

10.1109/89.985542

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper focuses on microphone arrays to realize distant-talking speech recognition in real environments. In distant-talking situations, users can speak at arbitrary positions while moving. Therefore, it is very important for high quality speech acquisition using microphone arrays to localize a talker accurately. However, it is very difficult to localize a moving talker in noisy and reverberant environments. The talker localization errors result in performance degradation of speech recognition. One way to solve this problem is to integrate the speech recognition process and the talker localization into a unified framework. This paper proposes a new speech recognition algorithm based on a three-dimensional (3-D) Viterbi search. The 3-D Viterbi method extracts a direction-time sequence of parameter vectors by steering a beam to every direction in every frame, then rinds the most likely path in a 3-D trellis space composed of talker directions, input frames and HMM states. This means that speech recognition and talker localization are performed simultaneously within a statistical framework. To evaluate the performance of the 3-D Viterbi method, recognition experiments for real environment data were carried out. The results confirmed that the 3-D Viterbi method drastically improves the recognition performance for the moving talker case as well as for the fixed-position talker case.

引用

页码：48 / 56

页数：9

共 50 条

[21] Hidden Markov model training with contaminated speech material for distant-talking speech recognition
Matassoni, M
Omologo, M
Giuliani, D
Svaizer, P
COMPUTER SPEECH AND LANGUAGE, 2002, 16 (02): : 205 - 223
[22] Distant-talking Continuous Speech Recognition based on a novel Reverberation Model in the Feature Domain
Sehr, Armin
Zeller, Marcus
Kellermann, Walter
INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 769 - 772
[23] Combination of bottleneck feature extraction and dereverberation for distant-talking speech recognition
Ren, Bo
Wang, Longbiao
Lu, Liang
Ueda, Yuma
Kai, Atsuhiko
MULTIMEDIA TOOLS AND APPLICATIONS, 2016, 75 (09) : 5093 - 5108
[24] Denoising autoencoder and environment adaptation for distant-talking speech recognition with asynchronous speech recording
Wang, Longbiao
Ren, Bo
Ueda, Yuma
Kai, Atsuhiko
Teraoka, Shunta
Fukushima, Taku
2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2014,
[25] A TWO-MICROPHONE BASED VOICE ACTIVITY DETECTION FOR DISTANT-TALKING SPEECH IN WIDE RANGE OF DIRECTION OF ARRIVAL
Guo, Yanmeng
Li, Kai
Fu, Qiang
Yan, Yonghong
2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4901 - 4904
[26] Distant-Talking Speech Recognition Based on Spectral Subtraction by Multi-Channel LMS Algorithm
Wang, Longbiao
Kitaoka, Norihide
Nakagawa, Seiichi
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2011, E94D (03): : 659 - 667
[27] A TWO-MICROPHONE BASED VOICE ACTIVITY DETECTION FOR DISTANT-TALKING SPEECH IN WIDE RANGE OF DIRECTION OF ARRIVAL
Guo, Yanmeng
Li, Kai
Fu, Qiang
Yan, Yonghong
2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4901 - 4904
[28] Distant-talking Speech Recognition Based on Multi-objective Learning using Phase and Magnitude-based Feature
Li, Dongbo
Wang, Longbiao
Dang, Jianwu
Ge, Meng
Guan, Haotian
2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2018, : 394 - 398
[29] Reverberation Model-Based Decoding in the Logmelspec Domain for Robust Distant-Talking Speech Recognition
Sehr, Armin
Maas, Roland
Kellermann, Walter
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (07): : 1676 - 1691
[30] Distant-talking robust speech recognition using late reflection components of room impulse response
Gomez, Randy
Even, Jani
Saruwatari, Hiroshi
Shikano, Kiyohiro
2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4581 - 4584

← 1 2 3 4 5 →