Improvement of robot audition by interfacing sound source separation and automatic speech recognition with missing feature theory

被引:15
|
作者
Yamamoto, S [1 ]
Nakadai, K [1 ]
Tsujino, H [1 ]
Yokoyama, T [1 ]
Okuno, HG [1 ]
机构
[1] Kyoto Univ, Grad Sch Informat, Kyoto, Japan
关键词
D O I
10.1109/ROBOT.2004.1308039
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We have been developed robot audition system using the active direction-pass filter (ADPF) with the Scattering Theory, and demonstrated that the humanoid SIG could separate and recognize three simultaneous speeches originating from different directions. This is the first result that a robot can listen to several things simultaneously. However, its general applicability to other robots is not yet confirmed. Since automatic speech recognition (ASR) requires direction- and speaker-dependent acoustic models, it is difficult to adapt various kinds of environments. In addition ASR with lots of acoustic models causes slow processing. In this paper, these three problems are resolved. First, we confirmed the generality of the ADPF by applying it to two humanoids, SIG2 and Replie, under different enviromnents. Next, we present the new interface between ADPF and ASR based on the Missing Feature Theory, which masks broken features of separated sound to make them unavailable to ASR. This new interface improved the recognition performance of three simultaneous speeches tip to about 90%. Finally, since the ASR uses only a single acoustic model that is direction- and speaker-independent and created under clean environments, the processing of the whole system was made very light and fast.
引用
收藏
页码:1517 / 1523
页数:7
相关论文
共 39 条
  • [31] Assessment of General Applicability of Ego Noise Estimation - Applications to Automatic Speech Recognition and Sound Source Localization
    Ince, Goekhan
    Nakamura, Keisuke
    Asano, Futoshi
    Nakajima, Hirofumi
    Nakadai, Kazuhiro
    2011 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2011,
  • [32] Missing-Feature-Theory-based Robust Simultaneous Speech Recognition System with Non-clean Speech Acoustic Model
    Takahashi, Toni
    Nakadai, Kazuhiro
    Komatani, Kazunori
    Ogata, Tetsuya
    Okuno, Hiroshi G.
    2009 IEEE-RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, 2009, : 2730 - 2735
  • [33] Multi-talker Speech Recognition under Ego-motion Noise using Missing Feature Theory
    Ince, Goekhan
    Nakadai, Kazuhiro
    Rodemann, Tobias
    Tsujino, Hiroshi
    Imura, Jun-ichi
    IEEE/RSJ 2010 INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS 2010), 2010, : 982 - 987
  • [34] A missing data-based feature fusion strategy for noise-robust automatic speech recognition using noisy sensors
    Demiroglu, Cenk
    Anderson, David V.
    Clements, Mark. A.
    2007 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-11, 2007, : 965 - 968
  • [35] Non-negative Matrix Based Optimization Scheme for Blind Source Separation in Automatic Speech Recognition System
    Santosh, Kumar S.
    Bharathi, S. H.
    Archana, M.
    PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON COMMUNICATION AND ELECTRONICS SYSTEMS (ICCES), 2016, : 782 - 787
  • [36] Joint Blind Source Separation and Dereverberation for Automatic Speech Recognition using Delayed-Subsource MNMF with Localization Prior
    Fras, Mieszko
    Witkowski, Marcin
    Kowalczyk, Konrad
    INTERSPEECH 2023, 2023, : 3734 - 3738
  • [37] Acoustic Model Combination Incorporated With Mask-Based Multi-Channel Source Separation for Automatic Speech Recognition
    Yoon, Jae Sam
    Park, Ji Hun
    Kim, Hong Kook
    Kim, Hoirin
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2010, 4 (05) : 772 - 784
  • [38] Automatic Multi-Speaker Speech Recognition System Based on Time-Frequency Blind Source Separation under Ubiquitous Environment
    Wang, Zhe
    Zhang, Haijian
    Bi, Guoan
    Li, Xiumei
    PROCEEDINGS OF THE 2014 9TH IEEE CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS (ICIEA), 2014, : 101 - +