Robust speech recognition system for communication robots in real environments

被引:7
|
作者
Ishi, Carlos Toshinori [1 ]
Matsuda, Shigeki [2 ]
Kanda, Takayuki [1 ]
Jitsuhiro, Takatoshi [3 ]
Ishiguro, Hiroshi [1 ]
Nakamura, Satoshi [2 ]
Hagita, Norihiro [1 ]
机构
[1] ATR, Intelligent Robot & Commun Labs, Kyoto, Japan
[2] Natl Inst Informat & Commun Technol, Spoken Language Commun Res Lab, ATR, Kyoto, Japan
[3] Knowledge Sci Lab, ATR, Kyoto, Japan
关键词
communication robots; speech recognition; robustness; acoustic noise; children speech;
D O I
10.1109/ICHR.2006.321294
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The application range of communication robots could be widely expanded by the use of an automatic speech recognition (ASR) system with improved robustness for noise and for speakers of different ages. In this paper, we describe an ASR system which can robustly recognize speech by adults and children in noisy environments. We evaluate the ASR system in a communication robot placed in a real noisy environment. Speech is captured using a twelve-element microphone array arranged in the robot chest. To suppress interference and noise and to attenuate reverberation, we implemented a multi-channel system consisting of an outlier-robust generalized sidelobe canceller (RGSC) technique and a feature-space noise suppression using MMSE criteria. Speech activity periods are detected using GMM-based end-point detection (GMM-EPD). Our ASR system has two decoders for adults' and children's speech. The final hypothesis is selected based on posterior probability. We then assign a generalized word posterior probability (GWPP)-based confidence measure to this hypothesis, and if it is higher than a threshold, we transfer it to a subsequent dialog processing module. The performance of each step was evaluated for adults' and children's speech, by adding different levels of real environment noise recorded in a cafeteria. Experimental results indicated that our ASR system could achieve over 80 % word accuracy in 70 dBA noise. Further evaluation of adult speech recorded in a real noisy environment resulted in 73 % word accuracy.
引用
收藏
页码:340 / +
页数:2
相关论文
共 50 条
  • [1] A robust speech recognition system for communication robots in noisy environments
    Ishi, Carlos Toshinori
    Matsuda, Shigeki
    Kanda, Takayuki
    Jitsuhiro, Takatoshi
    Ishiguro, Hiroshi
    Nakamura, Satoshi
    Hagita, Norihiro
    IEEE TRANSACTIONS ON ROBOTICS, 2008, 24 (03) : 759 - 763
  • [2] Unsupervised speaker adaptation for robust speech recognition in real environments
    Yamade, S
    Baba, A
    Yoshikawa, S
    Lee, A
    Saruwatari, H
    Shikano, K
    ELECTRONICS AND COMMUNICATIONS IN JAPAN PART II-ELECTRONICS, 2005, 88 (08): : 30 - 41
  • [3] Auditory model for robust speech recognition in real world noisy environments
    Kim, DS
    Lee, SY
    Kil, RM
    Zhu, XL
    ELECTRONICS LETTERS, 1997, 33 (01) : 12 - 13
  • [4] Speech/music discrimination for robust speech recognition in robots
    Choi, Mu Yeol
    Song, Hwa Jeon
    Kim, Hyung Soon
    2007 RO-MAN: 16TH IEEE INTERNATIONAL SYMPOSIUM ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION, VOLS 1-3, 2007, : 118 - +
  • [5] Robust speech recognition in car environments
    Shozakai, M
    Nakamura, S
    Shikano, K
    PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 269 - 272
  • [6] Auditory processing of speech signals for robust speech recognition in real-world noisy environments
    Kim, DS
    Lee, SY
    Kil, RM
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1999, 7 (01): : 55 - 69
  • [7] Robust Mandarin speech recognition in car environments for embedded navigation system
    Ding, Pei
    He, Lei
    Yan, Xiang
    Zhao, Rui
    Hao, Jie
    IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2008, 54 (02) : 584 - 590
  • [8] Robust estimators for speech enhancement in real environments
    Sandoval-Ibarra, Yuma
    Diaz-Ramirez, Victor H.
    Kober, Vitaly
    OPTICS AND PHOTONICS FOR INFORMATION PROCESSING IX, 2015, 9598
  • [9] Robust speech recognition in noisy environments: The 2001 IBM SPINE evaluation system
    Kingsbury, B
    Saon, G
    Mangu, L
    Padmanabhan, M
    Sarikaya, R
    2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 53 - 56
  • [10] Robust Signboard Detection and Recognition in Real Environments
    Cheewaprakobkit, Pimpa
    Lin, Chih-Yang
    Lin, Kuan-Hung
    Shih, Timothy K. K.
    IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2023, 69 (03) : 421 - 430