Missing-feature based speech recognition for two simultaneous speech signals separated by ICA with a pair of humanoid ears

被引:6
|
作者
Takeda, Ryu [1 ]
Yamamoto, Shun'ichi [1 ]
Komatani, Kazunori [1 ]
Ogata, Tetsuya [1 ]
Okuno, Hiroshi G. [1 ]
机构
[1] Kyoto Univ, Grad Sch Informat, Kyoto 6068501, Japan
关键词
robot audition; multiple speakers; ICA; missing-feature methods; automatic speech recognition;
D O I
10.1109/IROS.2006.281741
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Robot audition is a critical technology in making robots symbiosis with people. Since we hear a mixture of sounds in our daily lives, sound source localization and separation, and recognition of separated sounds are three essential capabilities. Sound source localization has been recently studied well for robots, while the other capabilities still need extensive studies. This paper reports the robot audition system with a pair of omni-directional microphones embedded in a humanoid to recognize two simultaneous talkers. It first separates sound sources by Independent Component Analysis (ICA) with single-input multiple-output (SIMO) model. Then, spectral distortion for separated sounds is estimated to identify reliable and unreliable components of the spectrogram. This estimation generates the missing feature masks as spectrographic masks. These masks are then used to avoid influences caused by spectral distortion in automatic speech recognition based on missing-feature method. The novel ideas of our system reside in estimates of spectral distortion of temporal-frequency domain in terms of feature vectors. In addition, we point out that the voice-activity detection (VAD) is effective to overcome the weak point of ICA against the changing number of talkers. The resulting system outperformed the baseline robot audition system by 15%.
引用
收藏
页码:878 / +
页数:2
相关论文
共 33 条
  • [1] Missing-feature approaches in speech recognition
    Raj, B
    Stern, RM
    IEEE SIGNAL PROCESSING MAGAZINE, 2005, 22 (05) : 101 - 116
  • [2] Improving Speech Recognition of Two Simultaneous Speech Signals by Integrating ICA BSS and Automatic Missing Feature Mask Generation
    Takeda, Ryu
    Yamamoto, Shun'ichi
    Komatani, Kazunori
    Ogata, Tetsuya
    Okuno, Hiroshi G.
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2302 - 2305
  • [3] Soft Missing-Feature Mask Generation for Simultaneous Speech Recognition System in Robots
    Takahashi, Toru
    Yamamoto, Shun'ichi
    Nakadai, Kazuhiro
    Komatani, Kazunori
    Ogata, Tetsuya
    Okuno, Hiroshi G.
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 992 - +
  • [4] MMSE-Based Missing-Feature Reconstruction With Temporal Modeling for Robust Speech Recognition
    Gonzalez, Jose A.
    Peinado, Antonio M.
    Ma, Ning
    Gomez, Angel M.
    Barker, Jon
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (03): : 624 - 635
  • [5] Recognition of convolutive speech mixtures by missing feature techniques for ICA
    Kolossa, Dorothea
    Sawada, Hiroshi
    Astudillo, Ramon Fernandez
    Orglmeister, Reinhold
    Makino, Shoji
    2006 FORTIETH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS, VOLS 1-5, 2006, : 1397 - +
  • [6] Mask classification for missing-feature reconstruction for robust speech recognition in unknown background noise
    Kim, Wooil
    Stern, Richard M.
    SPEECH COMMUNICATION, 2011, 53 (01) : 1 - 11
  • [7] Speech recognition for a humanoid with motor noise utilizing missing feature theory
    Nishimura, Yoshitaka
    Ishizuka, Mitsuru
    Nakadai, Kazuhiro
    Nakano, Mikio
    Tsujino, Hiroshi
    2006 6TH IEEE-RAS INTERNATIONAL CONFERENCE ON HUMANOID ROBOTS, VOLS 1 AND 2, 2006, : 26 - +
  • [8] Combining Noise Compensation and Missing-Feature Decoding for Large Vocabulary Speech Recognition in Noise
    Lu, Jianhua
    Ming, Ji
    Woods, Roger
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1269 - 1272
  • [9] Missing-Feature Reconstruction for Band-Limited Speech Recognition in Spoken Document Retrieval
    Kim, Wooil
    Hansen, John H. L.
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2306 - 2309
  • [10] Recognition of simultaneous speech by estimating reliability of separated signals for robot audition
    Yamamoto, Shun'ichi
    Takeda, Ryu
    Nakadai, Kazuhiro
    Nakano, Mikio
    Tsujino, Hiroshi
    Valin, Jean-Marc
    Komatani, Kazunori
    Ogata, Tetsuya
    Okuno, Hiroshi G.
    PRICAI 2006: TRENDS IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2006, 4099 : 484 - 494