Robust speech recognition system for communication robots in real environments

被引:7
|
作者
Ishi, Carlos Toshinori [1 ]
Matsuda, Shigeki [2 ]
Kanda, Takayuki [1 ]
Jitsuhiro, Takatoshi [3 ]
Ishiguro, Hiroshi [1 ]
Nakamura, Satoshi [2 ]
Hagita, Norihiro [1 ]
机构
[1] ATR, Intelligent Robot & Commun Labs, Kyoto, Japan
[2] Natl Inst Informat & Commun Technol, Spoken Language Commun Res Lab, ATR, Kyoto, Japan
[3] Knowledge Sci Lab, ATR, Kyoto, Japan
关键词
communication robots; speech recognition; robustness; acoustic noise; children speech;
D O I
10.1109/ICHR.2006.321294
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The application range of communication robots could be widely expanded by the use of an automatic speech recognition (ASR) system with improved robustness for noise and for speakers of different ages. In this paper, we describe an ASR system which can robustly recognize speech by adults and children in noisy environments. We evaluate the ASR system in a communication robot placed in a real noisy environment. Speech is captured using a twelve-element microphone array arranged in the robot chest. To suppress interference and noise and to attenuate reverberation, we implemented a multi-channel system consisting of an outlier-robust generalized sidelobe canceller (RGSC) technique and a feature-space noise suppression using MMSE criteria. Speech activity periods are detected using GMM-based end-point detection (GMM-EPD). Our ASR system has two decoders for adults' and children's speech. The final hypothesis is selected based on posterior probability. We then assign a generalized word posterior probability (GWPP)-based confidence measure to this hypothesis, and if it is higher than a threshold, we transfer it to a subsequent dialog processing module. The performance of each step was evaluated for adults' and children's speech, by adding different levels of real environment noise recorded in a cafeteria. Experimental results indicated that our ASR system could achieve over 80 % word accuracy in 70 dBA noise. Further evaluation of adult speech recorded in a real noisy environment resulted in 73 % word accuracy.
引用
收藏
页码:340 / +
页数:2
相关论文
共 50 条
  • [31] Robust speech detection method for telephone speech recognition system
    ATR Interpreting Telecommunications, Research Lab, Kyoto, Japan
    Speech Commun, 2 (135-148):
  • [32] Robust speech detection method for telephone speech recognition system
    Kuroiwa, S
    Naito, M
    Yamamoto, S
    Higuchi, N
    SPEECH COMMUNICATION, 1999, 27 (02) : 135 - 148
  • [33] Hybrid Architecture for Robust Speech Recognition System
    Pasricha, Vishal
    Aggarwal, Rajesh
    2016 INTERNATIONAL CONFERENCE ON RECENT ADVANCES AND INNOVATIONS IN ENGINEERING (ICRAIE), 2016,
  • [34] A Robust Navigation Method for Mobile Robots in Real-World Environments
    Rahok, Sam Ann
    Oneda, Hirohisa
    Tanaka, Akio
    Ozaki, Koichi
    JOURNAL OF ROBOTICS AND MECHATRONICS, 2014, 26 (02) : 177 - 184
  • [35] A spatio-temporal speech enhancement scheme for robust speech recognition in noisy environments
    Visser, E
    Otsuka, M
    Lee, TW
    SPEECH COMMUNICATION, 2003, 41 (2-3) : 393 - 407
  • [36] Robust Front End Processing for Speech Recognition in Reverberant Environments: Utilization of Speech Characteristics
    Petrick, Rico
    Lu, Xugang
    Unoki, Masashi
    Akagi, Masato
    Hoffmann, Ruediger
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 658 - +
  • [37] Real Time Distant Speech Emotion Recognition in Indoor Environments
    Ahmed, Mohsin Y.
    Chen, Zeya
    Fass, Emma
    Stankovic, John
    PROCEEDINGS OF THE 14TH EAI INTERNATIONAL CONFERENCE ON MOBILE AND UBIQUITOUS SYSTEMS: COMPUTING, NETWORKING AND SERVICES (MOBIQUITOUS 2017), 2017, : 215 - 224
  • [38] Target Speech Detection and Separation for Communication with Humanoid Robots in Noisy Home Environments
    Kim, Hyun-Don
    Kim, Jinsung
    Komatani, Kazunori
    Ogata, Tetsuya
    Okuno, Hiroshi G.
    ADVANCED ROBOTICS, 2009, 23 (15) : 2093 - 2111
  • [39] Target speech detection and separation for communication with humanoid robots in noisy home environments
    Department of Intelligence Science and Technology, Graduate School of Informatics, Kyoto University, Yoshida-honmachi, Sakyo-ku, Kyoto 606-8501, Japan
    不详
    Adv Rob, 15 (2093-2111):
  • [40] A robust face detection system for real environments
    Foresti, GL
    Micheloni, C
    Shidaro, L
    2003 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL 3, PROCEEDINGS, 2003, : 897 - 900