Improvement of recognition of simultaneous speech signals using AV integration and scattering theory for humanoid robots

被引:36
|
作者
Nakadai, K
Matsuura, D
Okuno, HG
Tsujino, H
机构
[1] Honda Res Inst Japan Co Ltd, Wako, Saitama 3510114, Japan
[2] Tokyo Inst Technol, Grad Sch Sci & Engn, Meguro Ku, Tokyo 1528550, Japan
[3] Kyoto Univ, Grad Sch Informat, Sakyo Ku, Kyoto 6068501, Japan
关键词
audio-visual integration; robot audition; scattering theory; sound source localization; sound source separation; speech recognition; active audition;
D O I
10.1016/j.specom.2004.10.010
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper presents a method to improve recognition of three simultaneous speech signals by a humanoid robot equipped with a pair of microphones. In such situations, sound separation and automatic speech recognition (ASR) of the separated speech signal are difficult, because the signal-to-noise ratio is quite low (around -3dB) and noise is not stable due to interfering voices. To improve recognition of three simultaneous speech signals, two key ideas are introduced. One is two-layered audio-visual integration of both name (ID) and location, that is, speech and face recognition, and speech and face localization. The other is acoustical modeling of the humanoid head by scattering theory. Sound sources are separated in real-time by an active direction-pass filter (ADPF), which extracts sounds from a specified direction by using the interaural phase/intensity difference estimated by scattering theory. Since features of separated sounds vary according to the sound direction, multiple direction- and speaker-dependent acoustic models are used. The system integrates ASR results by using the sound direction and speaker information provided by face recognition as well as confidence measure of ASR results to select the best one. The resulting system shows an improvement of about 10% on average against recognition of three simultaneous speech signals, where three speakers were located around the humanoid on a I m radius half circle, one of them being in front of him (angle 0degrees) and the other two being at symmetrical positions (+/-theta) varying by 10degrees steps from 0degrees to 90degrees. (C) 2004 Elsevier B.V. All rights reserved.
引用
收藏
页码:97 / 112
页数:16
相关论文
共 50 条
  • [21] Disordered Speech Recognition Using Acoustic and sEMG Signals
    Deng, Yunbin
    Patel, Rupal
    Heaton, James T.
    Colby, Glen
    Gilmore, L. Donald
    Cabrera, Joao
    Roy, Serge H.
    De Luca, Carlo J.
    Meltzner, Geoffrey S.
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 632 - +
  • [22] Soft Missing-Feature Mask Generation for Simultaneous Speech Recognition System in Robots
    Takahashi, Toru
    Yamamoto, Shun'ichi
    Nakadai, Kazuhiro
    Komatani, Kazunori
    Ogata, Tetsuya
    Okuno, Hiroshi G.
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 992 - +
  • [23] Indonesian Speech Recognition Grammar Using Kinect 2.0 for Controlling Humanoid Robot
    Tambunan, Mario Herryn
    Martin
    Fakhruroja, Hanif
    Riyanto
    Machbub, Carmadi
    2018 INTERNATIONAL CONFERENCE ON SIGNALS AND SYSTEMS (ICSIGSYS), 2018, : 59 - 63
  • [24] Performance improvement in speech recognition using multimodal features
    Kim, Myung Won
    Song, Won Moon
    Kim, Young Jin
    Kim, Eun Ju
    ICNC 2007: THIRD INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, VOL 2, PROCEEDINGS, 2007, : 686 - +
  • [25] INTEGRATION OF ACOUSTIC AND VISUAL SPEECH SIGNALS USING NEURAL NETWORKS
    YUHAS, BP
    GOLDSTEIN, MH
    SEJNOWSKI, TJ
    IEEE COMMUNICATIONS MAGAZINE, 1989, 27 (11) : 65 - 71
  • [26] Evaluation of blind separated signals using speech recognition system
    Eksler, V
    Eurocon 2005: The International Conference on Computer as a Tool, Vol 1 and 2 , Proceedings, 2005, : 1650 - 1653
  • [27] Multiexpert automatic speech recognition using acoustic and myoelectric signals
    Chan, ADC
    Englehart, KB
    Hudgins, B
    Lovely, DF
    IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 2006, 53 (04) : 676 - 685
  • [28] Recognition of Isolated Speech Signals using Simplified Statistical Parameters
    Mitra, Abhijit
    Mitra, Bhargav Kumar
    Chatterjee, Biswajoy
    PROCEEDINGS OF WORLD ACADEMY OF SCIENCE, ENGINEERING AND TECHNOLOGY, VOL 8, 2005, 8 : 151 - 154
  • [29] Age Recognition Based on Speech Signals using Weights Supervector
    Porat, Royi
    Lange, Dan
    Zigel, Yaniv
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2818 - 2821
  • [30] Improvement Of Speech Emotion Recognition with Neural Network Classifier by Using Speech Spectrogram
    Prasomphan, Sathit
    2015 INTERNATIONAL CONFERENCE ON SYSTEMS, SIGNALS AND IMAGE PROCESSING (IWSSIP 2015), 2015, : 73 - 76