Improvement of recognition of simultaneous speech signals using AV integration and scattering theory for humanoid robots

被引:36
|
作者
Nakadai, K
Matsuura, D
Okuno, HG
Tsujino, H
机构
[1] Honda Res Inst Japan Co Ltd, Wako, Saitama 3510114, Japan
[2] Tokyo Inst Technol, Grad Sch Sci & Engn, Meguro Ku, Tokyo 1528550, Japan
[3] Kyoto Univ, Grad Sch Informat, Sakyo Ku, Kyoto 6068501, Japan
关键词
audio-visual integration; robot audition; scattering theory; sound source localization; sound source separation; speech recognition; active audition;
D O I
10.1016/j.specom.2004.10.010
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper presents a method to improve recognition of three simultaneous speech signals by a humanoid robot equipped with a pair of microphones. In such situations, sound separation and automatic speech recognition (ASR) of the separated speech signal are difficult, because the signal-to-noise ratio is quite low (around -3dB) and noise is not stable due to interfering voices. To improve recognition of three simultaneous speech signals, two key ideas are introduced. One is two-layered audio-visual integration of both name (ID) and location, that is, speech and face recognition, and speech and face localization. The other is acoustical modeling of the humanoid head by scattering theory. Sound sources are separated in real-time by an active direction-pass filter (ADPF), which extracts sounds from a specified direction by using the interaural phase/intensity difference estimated by scattering theory. Since features of separated sounds vary according to the sound direction, multiple direction- and speaker-dependent acoustic models are used. The system integrates ASR results by using the sound direction and speaker information provided by face recognition as well as confidence measure of ASR results to select the best one. The resulting system shows an improvement of about 10% on average against recognition of three simultaneous speech signals, where three speakers were located around the humanoid on a I m radius half circle, one of them being in front of him (angle 0degrees) and the other two being at symmetrical positions (+/-theta) varying by 10degrees steps from 0degrees to 90degrees. (C) 2004 Elsevier B.V. All rights reserved.
引用
收藏
页码:97 / 112
页数:16
相关论文
共 50 条
  • [1] Improvement of Speech Recognition for Robots Using Blind Signal Separation
    Bicher, Daniel
    Kroll-Peters, Olaf
    Lee, Thebin
    Tiotuico, Natascha
    Wilhelm, Mathias
    ISCGAV'08: PROCEEDINGS OF THE 8TH WSEAS INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, COMPUTATIONAL GEOMETRY AND ARTIFICIAL VISION, 2008, : 52 - 55
  • [2] Integration of Speech and Action in Humanoid Robots: iCub Simulation Experiments
    Tikhanoff, Vadim
    Cangelosi, Angelo
    Metta, Giorgio
    IEEE TRANSACTIONS ON AUTONOMOUS MENTAL DEVELOPMENT, 2011, 3 (01) : 17 - 29
  • [3] Missing-feature based speech recognition for two simultaneous speech signals separated by ICA with a pair of humanoid ears
    Takeda, Ryu
    Yamamoto, Shun'ichi
    Komatani, Kazunori
    Ogata, Tetsuya
    Okuno, Hiroshi G.
    2006 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, VOLS 1-12, 2006, : 878 - +
  • [4] Intelligent Speech Communication Using Double Humanoid Robots
    Juang, Li-Hong
    Zhao, Yi-Hua
    INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2020, 26 (02): : 291 - 301
  • [5] Zero-Crossing-Based Speech Segregation and Recognition for Humanoid Robots
    An, Sung Jun
    Kil, Rhee Man
    Kim, Young-Ik
    IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2009, 55 (04) : 2341 - 2348
  • [6] Integration of Indonesian Speech and Hand Gesture Recognition for Controlling Humanoid Robot
    Fakhrurroja, Hanif
    Riyanto
    Purwarianti, Ayu
    Prihatmanto, Ary Setijadi
    Machbub, Carmadi
    2018 15TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION, ROBOTICS AND VISION (ICARCV), 2018, : 1590 - 1595
  • [7] Speech recognition for a humanoid with motor noise utilizing missing feature theory
    Nishimura, Yoshitaka
    Ishizuka, Mitsuru
    Nakadai, Kazuhiro
    Nakano, Mikio
    Tsujino, Hiroshi
    2006 6TH IEEE-RAS INTERNATIONAL CONFERENCE ON HUMANOID ROBOTS, VOLS 1 AND 2, 2006, : 26 - +
  • [8] Bayesian Integration of Sound Source Separation and Speech Recognition: A New Approach to Simultaneous Speech Recognition
    Itakura, Kousuke
    Nishimuta, Izaya
    Bando, Yoshiaki
    Itoyama, Katsutoshi
    Yoshii, Kazuyoshi
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 736 - 740
  • [9] A new kinetostatic model for humanoid robots using screw theory
    Toscano, Gustavo S.
    Simas, Henrique
    Castelan, Eugenio B.
    Martins, Daniel
    ROBOTICA, 2018, 36 (04) : 570 - 587
  • [10] Kinect microphone array-based speech and speaker recognition for the exhibition control of humanoid robots
    Ding, Ing-Jr
    Shi, Jia-Yi
    COMPUTERS & ELECTRICAL ENGINEERING, 2017, 62 : 719 - 729