Missing-feature based speech recognition for two simultaneous speech signals separated by ICA with a pair of humanoid ears

被引：6

作者：

Takeda, Ryu ^{[1
]}

Yamamoto, Shun'ichi ^{[1
]}

Komatani, Kazunori ^{[1
]}

Ogata, Tetsuya ^{[1
]}

Okuno, Hiroshi G. ^{[1
]}

机构：

[1] Kyoto Univ, Grad Sch Informat, Kyoto 6068501, Japan

来源：

2006 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, VOLS 1-12 | 2006年

关键词：

robot audition; multiple speakers; ICA; missing-feature methods; automatic speech recognition;

D O I：

10.1109/IROS.2006.281741

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Robot audition is a critical technology in making robots symbiosis with people. Since we hear a mixture of sounds in our daily lives, sound source localization and separation, and recognition of separated sounds are three essential capabilities. Sound source localization has been recently studied well for robots, while the other capabilities still need extensive studies. This paper reports the robot audition system with a pair of omni-directional microphones embedded in a humanoid to recognize two simultaneous talkers. It first separates sound sources by Independent Component Analysis (ICA) with single-input multiple-output (SIMO) model. Then, spectral distortion for separated sounds is estimated to identify reliable and unreliable components of the spectrogram. This estimation generates the missing feature masks as spectrographic masks. These masks are then used to avoid influences caused by spectral distortion in automatic speech recognition based on missing-feature method. The novel ideas of our system reside in estimates of spectral distortion of temporal-frequency domain in terms of feature vectors. In addition, we point out that the voice-activity detection (VAD) is effective to overcome the weak point of ICA against the changing number of talkers. The resulting system outperformed the baseline robot audition system by 15%.

引用

页码：878 / +

页数：2

共 33 条

[31] Two-stage continuous speech recognition using feature-based models: A preliminary study
Tang, M
Seneff, S
Zue, V
ASRU'03: 2003 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING ASRU '03, 2003, : 49 - 54
[32] A missing data-based feature fusion strategy for noise-robust automatic speech recognition using noisy sensors
Demiroglu, Cenk
Anderson, David V.
Clements, Mark. A.
2007 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-11, 2007, : 965 - 968
[33] Optimal feature selection based speech emotion recognition using two-stream deep convolutional neural network
Mustaqeem
Kwon, Soonil
INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2021, 36 (09) : 5116 - 5135

← 1 2 3 4 →