Improvement of robot audition by interfacing sound source separation and automatic speech recognition with missing feature theory

被引:15
|
作者
Yamamoto, S [1 ]
Nakadai, K [1 ]
Tsujino, H [1 ]
Yokoyama, T [1 ]
Okuno, HG [1 ]
机构
[1] Kyoto Univ, Grad Sch Informat, Kyoto, Japan
关键词
D O I
10.1109/ROBOT.2004.1308039
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We have been developed robot audition system using the active direction-pass filter (ADPF) with the Scattering Theory, and demonstrated that the humanoid SIG could separate and recognize three simultaneous speeches originating from different directions. This is the first result that a robot can listen to several things simultaneously. However, its general applicability to other robots is not yet confirmed. Since automatic speech recognition (ASR) requires direction- and speaker-dependent acoustic models, it is difficult to adapt various kinds of environments. In addition ASR with lots of acoustic models causes slow processing. In this paper, these three problems are resolved. First, we confirmed the generality of the ADPF by applying it to two humanoids, SIG2 and Replie, under different enviromnents. Next, we present the new interface between ADPF and ASR based on the Missing Feature Theory, which masks broken features of separated sound to make them unavailable to ASR. This new interface improved the recognition performance of three simultaneous speeches tip to about 90%. Finally, since the ASR uses only a single acoustic model that is direction- and speaker-independent and created under clean environments, the processing of the whole system was made very light and fast.
引用
收藏
页码:1517 / 1523
页数:7
相关论文
共 39 条
  • [1] An Improvement in Automatic Speech Recognition Using Soft Missing Feature Masks for Robot Audition
    Takahashi, Toru
    Nakadai, Kazuhiro
    Komatani, Kazunori
    Ogata, Tetsuya
    Okuno, Hiroshi G.
    IEEE/RSJ 2010 INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS 2010), 2010, : 964 - 969
  • [2] Enhanced robot speech recognition based on microphone array source separation and missing feature theory
    Yamamoto, S
    Valin, JM
    Nakadai, K
    Rouat, J
    Michaud, F
    Ogata, T
    Okuno, HG
    2005 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), VOLS 1-4, 2005, : 1477 - 1482
  • [3] Robot Audition: Missing Feature Theory Approach and Active Audition
    Okuno, Hiroshi G.
    Nakadai, Kazuhiro
    Kim, Hyun-Don
    ROBOTICS RESEARCH, 2011, 70 : 227 - +
  • [4] Exploiting known sound source signals to improve ICA-based robot audition in speech separation and recognition
    Takeda, Ryu
    Nakadai, Kazuhiro
    Komatani, Kazunori
    Ogata, Tetsuya
    Okuno, Hiroshi G.
    2007 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, VOLS 1-9, 2007, : 1763 - +
  • [5] SOUND SOURCE SEPARATION OF MOVING SPEAKERS FOR ROBOT AUDITION
    Nakadai, Kazuhiro
    Nakajima, Hirofumi
    Hasegawa, Yuji
    Tsujino, Hiroshi
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 3685 - 3688
  • [6] Sound Source Separation and Automatic Speech Recognition for Moving Sources
    Nakadai, Kazuhiro
    Nakajima, Hirofumi
    Ince, Goekhan
    Hasegawa, Yuji
    IEEE/RSJ 2010 INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS 2010), 2010, : 976 - 981
  • [7] Sound Source Separation for Robot Audition using Deep Learning
    Noda, Kuniaki
    Hashimoto, Naoya
    Nakadai, Kazuhiro
    Ogata, Tetsuya
    2015 IEEE-RAS 15TH INTERNATIONAL CONFERENCE ON HUMANOID ROBOTS (HUMANOIDS), 2015, : 389 - 394
  • [8] Design and implementation of a robot audition system for automatic speech recognition of simultaneous speech
    Yamamoto, Shun'ichi
    Nakadai, Kazuhiro
    Nakano, Mikio
    Tsujino, Hiroshi
    Valin, Jean-Marc
    Komatani, Kazunori
    Ogata, Tetsuya
    Okuno, Hiroshi G.
    2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2, 2007, : 111 - +
  • [9] Multiple target recognition based on blind source separation and missing feature theory
    Qi, H
    Tao, X
    Tao, LH
    IEEE CAMSAP 2005: FIRST INTERNATIONAL WORKSHOP ON COMPUTATIONAL ADVANCES IN MULTI-SENSOR ADAPTIVE PROCESSING, 2005, : 205 - 208
  • [10] High performance sound source separation adaptable to environmental changes for robot audition
    Nakajima, Hirofumi
    Nakadai, Kazuhiro
    Hasegawa, Yuuji
    Tsujino, Hiroshi
    2008 IEEE/RSJ INTERNATIONAL CONFERENCE ON ROBOTS AND INTELLIGENT SYSTEMS, VOLS 1-3, CONFERENCE PROCEEDINGS, 2008, : 2165 - 2171