Missing-feature based speech recognition for two simultaneous speech signals separated by ICA with a pair of humanoid ears

被引:6
|
作者
Takeda, Ryu [1 ]
Yamamoto, Shun'ichi [1 ]
Komatani, Kazunori [1 ]
Ogata, Tetsuya [1 ]
Okuno, Hiroshi G. [1 ]
机构
[1] Kyoto Univ, Grad Sch Informat, Kyoto 6068501, Japan
关键词
robot audition; multiple speakers; ICA; missing-feature methods; automatic speech recognition;
D O I
10.1109/IROS.2006.281741
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Robot audition is a critical technology in making robots symbiosis with people. Since we hear a mixture of sounds in our daily lives, sound source localization and separation, and recognition of separated sounds are three essential capabilities. Sound source localization has been recently studied well for robots, while the other capabilities still need extensive studies. This paper reports the robot audition system with a pair of omni-directional microphones embedded in a humanoid to recognize two simultaneous talkers. It first separates sound sources by Independent Component Analysis (ICA) with single-input multiple-output (SIMO) model. Then, spectral distortion for separated sounds is estimated to identify reliable and unreliable components of the spectrogram. This estimation generates the missing feature masks as spectrographic masks. These masks are then used to avoid influences caused by spectral distortion in automatic speech recognition based on missing-feature method. The novel ideas of our system reside in estimates of spectral distortion of temporal-frequency domain in terms of feature vectors. In addition, we point out that the voice-activity detection (VAD) is effective to overcome the weak point of ICA against the changing number of talkers. The resulting system outperformed the baseline robot audition system by 15%.
引用
收藏
页码:878 / +
页数:2
相关论文
共 33 条
  • [21] Coarse speech recognition by audio-visual integration based on missing feature theory
    Koiwa, Tomoaki
    Nakadai, Kazuhiro
    Imura, Jun-ichi
    2007 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, VOLS 1-9, 2007, : 1757 - 1762
  • [22] DNN-Based Feature Enhancement Using DOA-Constrained ICA for Robust Speech Recognition
    Lee, Ho-Yong
    Cho, Ji-Won
    Kim, Minook
    Park, Hyung-Min
    IEEE SIGNAL PROCESSING LETTERS, 2016, 23 (08) : 1091 - 1095
  • [23] Enhanced robot speech recognition based on microphone array source separation and missing feature theory
    Yamamoto, S
    Valin, JM
    Nakadai, K
    Rouat, J
    Michaud, F
    Ogata, T
    Okuno, HG
    2005 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), VOLS 1-4, 2005, : 1477 - 1482
  • [24] Exploiting known sound source signals to improve ICA-based robot audition in speech separation and recognition
    Takeda, Ryu
    Nakadai, Kazuhiro
    Komatani, Kazunori
    Ogata, Tetsuya
    Okuno, Hiroshi G.
    2007 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, VOLS 1-9, 2007, : 1763 - +
  • [25] Particle Swarm Optimization Based Feature Enhancement and Feature Selection for Improved Emotion Recognition in Speech and Glottal Signals
    Muthusamy, Hariharan
    Polat, Kemal
    Yaacob, Sazali
    PLOS ONE, 2015, 10 (03):
  • [26] Mask Estimation in Non-stationary Noise Environments for Missing Feature Based Robust Speech Recognition
    Badiezadegan, Shirin
    Rose, Richard C.
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2062 - 2065
  • [27] Efficient feature extraction and de-noising method for Chinese speech signals using GGM-based ICA
    Bin, Y
    Wei, K
    PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS AND APPLICATIONS, PROCEEDINGS, 2005, 3773 : 925 - 932
  • [28] Two-stage model-based feature compensation for robust speech recognition
    Haifeng Shen
    Gang Liu
    Jun Guo
    Computing, 2012, 94 : 1 - 20
  • [29] Two-stage model-based feature compensation for robust speech recognition
    Shen, Haifeng
    Liu, Gang
    Guo, Jun
    COMPUTING, 2012, 94 (01) : 1 - 20
  • [30] VTS feature compensation based on two-layer GMM structure for robust speech recognition
    Zhou, Lin
    Li, Haijing
    Chen, Ying
    Wu, Zhenyang
    Lu, Yong
    2016 8TH INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS & SIGNAL PROCESSING (WCSP), 2016,