Robust speech recognition system for communication robots in real environments

被引:7
|
作者
Ishi, Carlos Toshinori [1 ]
Matsuda, Shigeki [2 ]
Kanda, Takayuki [1 ]
Jitsuhiro, Takatoshi [3 ]
Ishiguro, Hiroshi [1 ]
Nakamura, Satoshi [2 ]
Hagita, Norihiro [1 ]
机构
[1] ATR, Intelligent Robot & Commun Labs, Kyoto, Japan
[2] Natl Inst Informat & Commun Technol, Spoken Language Commun Res Lab, ATR, Kyoto, Japan
[3] Knowledge Sci Lab, ATR, Kyoto, Japan
关键词
communication robots; speech recognition; robustness; acoustic noise; children speech;
D O I
10.1109/ICHR.2006.321294
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The application range of communication robots could be widely expanded by the use of an automatic speech recognition (ASR) system with improved robustness for noise and for speakers of different ages. In this paper, we describe an ASR system which can robustly recognize speech by adults and children in noisy environments. We evaluate the ASR system in a communication robot placed in a real noisy environment. Speech is captured using a twelve-element microphone array arranged in the robot chest. To suppress interference and noise and to attenuate reverberation, we implemented a multi-channel system consisting of an outlier-robust generalized sidelobe canceller (RGSC) technique and a feature-space noise suppression using MMSE criteria. Speech activity periods are detected using GMM-based end-point detection (GMM-EPD). Our ASR system has two decoders for adults' and children's speech. The final hypothesis is selected based on posterior probability. We then assign a generalized word posterior probability (GWPP)-based confidence measure to this hypothesis, and if it is higher than a threshold, we transfer it to a subsequent dialog processing module. The performance of each step was evaluated for adults' and children's speech, by adding different levels of real environment noise recorded in a cafeteria. Experimental results indicated that our ASR system could achieve over 80 % word accuracy in 70 dBA noise. Further evaluation of adult speech recorded in a real noisy environment resulted in 73 % word accuracy.
引用
收藏
页码:340 / +
页数:2
相关论文
共 50 条
  • [41] ROBUST SPEECH RECOGNITION UNDER NOISY ENVIRONMENTS USING ASYMMETRIC TAPERS
    Alam, Md Jahangir
    Kenny, Patrick
    O'Shaughnessy, Douglas
    2012 PROCEEDINGS OF THE 20TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2012, : 1638 - 1642
  • [42] Silence Energy Normalization for Robust Speech Recognition in Additive Noise Environments
    Tai, Chung-fu
    Hung, Jeih-weih
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2558 - 2561
  • [43] Silence Feature Normalization for Robust Speech Recognition in Additive Noise Environments
    Wang, Chieh-cheng
    Pan, Chi-an
    Hung, Jeih-weih
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1028 - 1031
  • [44] Robust technologies towards automatic speech recognition in car noise environments
    Ding, Pei
    He, Lei
    Yan, Xiang
    Zhao, Rui
    Hao, Jie
    2006 8TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, VOLS 1-4, 2006, : 776 - +
  • [45] Blind source extraction for robust speech recognition in multisource noisy environments
    Nesta, Francesco
    Matassoni, Marco
    COMPUTER SPEECH AND LANGUAGE, 2013, 27 (03): : 703 - 725
  • [46] Robust automatic speech recognition based on neural network in reverberant environments
    Bai, L.
    Li, H. L.
    He, Y. Y.
    CIVIL, ARCHITECTURE AND ENVIRONMENTAL ENGINEERING, VOLS 1 AND 2, 2017, : 1319 - 1324
  • [47] Recursive estimation of time-varying environments for robust speech recognition
    Zhao, YX
    Wang, SJ
    Yen, KC
    2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING - VOL IV: SIGNAL PROCESSING FOR COMMUNICATIONS; VOL V: SIGNAL PROCESSING EDUCATION SENSOR ARRAY & MULTICHANNEL SIGNAL PROCESSING AUDIO & ELECTROACOUSTICS; VOL VI: SIGNAL PROCESSING THEORY & METHODS STUDENT FORUM, 2001, : 225 - 228
  • [48] Development of noise robust real time automatic speech recognition system for Kannada language/dialects
    Yadava, G. Thimmaraja
    Nagaraja, B. G.
    Jayanna, H. S.
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 135
  • [49] Speech Intent Recognition for Robots
    Shen, Borui
    Inkpen, Diana
    PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON MATHEMATICS AND COMPUTERS IN SCIENCES AND IN INDUSTRY (MCSI 2016), 2016, : 185 - 190
  • [50] Speech recognition in parallel robots
    Jun, Tao
    PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON ROBOTICS, INTELLIGENT CONTROL AND ARTIFICIAL INTELLIGENCE (RICAI 2019), 2019, : 733 - 737