Improvement of recognition of simultaneous speech signals using AV integration and scattering theory for humanoid robots

被引:36
|
作者
Nakadai, K
Matsuura, D
Okuno, HG
Tsujino, H
机构
[1] Honda Res Inst Japan Co Ltd, Wako, Saitama 3510114, Japan
[2] Tokyo Inst Technol, Grad Sch Sci & Engn, Meguro Ku, Tokyo 1528550, Japan
[3] Kyoto Univ, Grad Sch Informat, Sakyo Ku, Kyoto 6068501, Japan
关键词
audio-visual integration; robot audition; scattering theory; sound source localization; sound source separation; speech recognition; active audition;
D O I
10.1016/j.specom.2004.10.010
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper presents a method to improve recognition of three simultaneous speech signals by a humanoid robot equipped with a pair of microphones. In such situations, sound separation and automatic speech recognition (ASR) of the separated speech signal are difficult, because the signal-to-noise ratio is quite low (around -3dB) and noise is not stable due to interfering voices. To improve recognition of three simultaneous speech signals, two key ideas are introduced. One is two-layered audio-visual integration of both name (ID) and location, that is, speech and face recognition, and speech and face localization. The other is acoustical modeling of the humanoid head by scattering theory. Sound sources are separated in real-time by an active direction-pass filter (ADPF), which extracts sounds from a specified direction by using the interaural phase/intensity difference estimated by scattering theory. Since features of separated sounds vary according to the sound direction, multiple direction- and speaker-dependent acoustic models are used. The system integrates ASR results by using the sound direction and speaker information provided by face recognition as well as confidence measure of ASR results to select the best one. The resulting system shows an improvement of about 10% on average against recognition of three simultaneous speech signals, where three speakers were located around the humanoid on a I m radius half circle, one of them being in front of him (angle 0degrees) and the other two being at symmetrical positions (+/-theta) varying by 10degrees steps from 0degrees to 90degrees. (C) 2004 Elsevier B.V. All rights reserved.
引用
收藏
页码:97 / 112
页数:16
相关论文
共 50 条
  • [31] Speech enhancement and recognition using circular microphone array for service robots
    Choi, C
    Kong, D
    Kim, J
    Bang, S
    IROS 2003: PROCEEDINGS OF THE 2003 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, VOLS 1-4, 2003, : 3516 - 3521
  • [32] SPEECH RECOGNITION OF ISOLATED DIGITS USING SIMULTANEOUS GENERATIVE HISTOGRAM
    HAYASHI, Y
    OGIHARA, A
    FUKUNAGA, K
    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 1993, E76A (12) : 2052 - 2054
  • [33] Simultaneous speech segmentation and phoneme recognition using dynamic programming
    Bajwa, RS
    Owens, RM
    Kelliher, TP
    1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 3213 - 3216
  • [34] Tactile Object Recognition for Humanoid Robots Using New Designed Piezoresistive Tactile Sensor and DCNN
    Pohtongkam, Somchai
    Srinonchat, Jakkree
    SENSORS, 2021, 21 (18)
  • [35] Using Automatic Speech Recognition to Measure the Intelligibility of Speech Synthesized from Brain Signals
    Varshney, Suvi
    Farias, Dana
    Brandman, David M.
    Stavisky, Sergey D.
    Miller, Lee M.
    2023 11TH INTERNATIONAL IEEE/EMBS CONFERENCE ON NEURAL ENGINEERING, NER, 2023,
  • [36] Improvement of the speech recognition in noisy environments using a nonparametric regression
    Amrouche, A.
    Taleb-Ahmed, A.
    Rouvaen, J. M.
    Yagoub, M. C. E.
    INTERNATIONAL JOURNAL OF PARALLEL EMERGENT AND DISTRIBUTED SYSTEMS, 2009, 24 (01) : 49 - 67
  • [37] Improvement of phone recognition accuracy using speech mode classification
    Tripathi K.
    Rao K.S.
    International Journal of Speech Technology, 2018, 21 (3) : 489 - 500
  • [38] Using prosody in fixed stress languages for improvement of speech recognition
    Szaszak, Gyoergy
    Vicsi, Klara
    VERBAL AND NONVERBAL COMMUNICATION BEHAVIOURS, 2007, 4775 : 138 - +
  • [39] A Natural Language Instruction System for Humanoid Robots Integrating Situated Speech Recognition, Visual Recognition and On-line Whole-Body Motion Generation
    Neo, Ee Sian
    Sakaguchi, Takeshi
    Yokoi, Kazuhito
    2008 IEEE/ASME INTERNATIONAL CONFERENCE ON ADVANCED INTELLIGENT MECHATRONICS, VOLS 1-3, 2008, : 1176 - 1182
  • [40] Determination of Psychogenic Markers in Speech Signals using the HHT Theory
    Tychkov, Alexander Yu
    Ageykin, Alexey, V
    Alimuradov, Alan K.
    Svetlov, Anatoliy, V
    FIFTH INTERNATIONAL CONFERENCE ON ENGINEERING AND TELECOMMUNICATION (ENT-MIPT 2018), 2018, : 184 - 187