Missing-feature based speech recognition for two simultaneous speech signals separated by ICA with a pair of humanoid ears

被引：6

作者：

Takeda, Ryu ^{[1
]}

Yamamoto, Shun'ichi ^{[1
]}

Komatani, Kazunori ^{[1
]}

Ogata, Tetsuya ^{[1
]}

Okuno, Hiroshi G. ^{[1
]}

机构：

[1] Kyoto Univ, Grad Sch Informat, Kyoto 6068501, Japan

来源：

2006 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, VOLS 1-12 | 2006年

关键词：

robot audition; multiple speakers; ICA; missing-feature methods; automatic speech recognition;

D O I：

10.1109/IROS.2006.281741

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Robot audition is a critical technology in making robots symbiosis with people. Since we hear a mixture of sounds in our daily lives, sound source localization and separation, and recognition of separated sounds are three essential capabilities. Sound source localization has been recently studied well for robots, while the other capabilities still need extensive studies. This paper reports the robot audition system with a pair of omni-directional microphones embedded in a humanoid to recognize two simultaneous talkers. It first separates sound sources by Independent Component Analysis (ICA) with single-input multiple-output (SIMO) model. Then, spectral distortion for separated sounds is estimated to identify reliable and unreliable components of the spectrogram. This estimation generates the missing feature masks as spectrographic masks. These masks are then used to avoid influences caused by spectral distortion in automatic speech recognition based on missing-feature method. The novel ideas of our system reside in estimates of spectral distortion of temporal-frequency domain in terms of feature vectors. In addition, we point out that the voice-activity detection (VAD) is effective to overcome the weak point of ICA against the changing number of talkers. The resulting system outperformed the baseline robot audition system by 15%.

引用

页码：878 / +

页数：2

共 33 条

[1] Missing-feature approaches in speech recognition
Raj, B
Stern, RM
IEEE SIGNAL PROCESSING MAGAZINE, 2005, 22 (05) : 101 - 116
[2] Improving Speech Recognition of Two Simultaneous Speech Signals by Integrating ICA BSS and Automatic Missing Feature Mask Generation
Takeda, Ryu
Yamamoto, Shun'ichi
Komatani, Kazunori
Ogata, Tetsuya
Okuno, Hiroshi G.
INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2302 - 2305
[3] Soft Missing-Feature Mask Generation for Simultaneous Speech Recognition System in Robots
Takahashi, Toru
Yamamoto, Shun'ichi
Nakadai, Kazuhiro
Komatani, Kazunori
Ogata, Tetsuya
Okuno, Hiroshi G.
INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 992 - +
[4] MMSE-Based Missing-Feature Reconstruction With Temporal Modeling for Robust Speech Recognition
Gonzalez, Jose A.
Peinado, Antonio M.
Ma, Ning
Gomez, Angel M.
Barker, Jon
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (03): : 624 - 635
[5] Recognition of convolutive speech mixtures by missing feature techniques for ICA
Kolossa, Dorothea
Sawada, Hiroshi
Astudillo, Ramon Fernandez
Orglmeister, Reinhold
Makino, Shoji
2006 FORTIETH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS, VOLS 1-5, 2006, : 1397 - +
[6] Mask classification for missing-feature reconstruction for robust speech recognition in unknown background noise
Kim, Wooil
Stern, Richard M.
SPEECH COMMUNICATION, 2011, 53 (01) : 1 - 11
[7] Speech recognition for a humanoid with motor noise utilizing missing feature theory
Nishimura, Yoshitaka
Ishizuka, Mitsuru
Nakadai, Kazuhiro
Nakano, Mikio
Tsujino, Hiroshi
2006 6TH IEEE-RAS INTERNATIONAL CONFERENCE ON HUMANOID ROBOTS, VOLS 1 AND 2, 2006, : 26 - +
[8] Combining Noise Compensation and Missing-Feature Decoding for Large Vocabulary Speech Recognition in Noise
Lu, Jianhua
Ming, Ji
Woods, Roger
INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1269 - 1272
[9] Missing-Feature Reconstruction for Band-Limited Speech Recognition in Spoken Document Retrieval
Kim, Wooil
Hansen, John H. L.
INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2306 - 2309
[10] Recognition of simultaneous speech by estimating reliability of separated signals for robot audition
Yamamoto, Shun'ichi
Takeda, Ryu
Nakadai, Kazuhiro
Nakano, Mikio
Tsujino, Hiroshi
Valin, Jean-Marc
Komatani, Kazunori
Ogata, Tetsuya
Okuno, Hiroshi G.
PRICAI 2006: TRENDS IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2006, 4099 : 484 - 494

← 1 2 3 4 →