Sound Source Separation for Robot Audition using Deep Learning

被引:0
|
作者
Noda, Kuniaki [1 ]
Hashimoto, Naoya [1 ]
Nakadai, Kazuhiro [2 ]
Ogata, Tetsuya [1 ]
机构
[1] Waseda Univ, Grad Sch Fundamental Sci & Engn, Tokyo 1698555, Japan
[2] Honda Res Inst Japan Co Ltd, Saitama 3510114, Japan
关键词
D O I
暂无
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
Noise robust speech recognition is crucial for effective human-machine interaction in real-world environments. Sound source separation (SSS) is one of the most widely used approaches for addressing noise robust speech recognition by extracting a target speaker's speech signal while suppressing simultaneous unintended signals. However, conventional SSS algorithms, such as independent component analysis or nonlinear principal component analysis, are limited in modeling complex projections with scalability. Moreover, conventional systems required designing an independent subsystem for noise reduction (NR) in addition to the SSS. To overcome these issues, we propose a deep neural network (DNN) framework for modeling the separation function (SF) of an SSS system. By training a DNN to predict clean sound features of a target sound from corresponding multichannel deteriorated sound feature inputs, we enable the DNN to model the SF for extracting the target sound without prior knowledge regarding the acoustic properties of the surrounding environment. Moreover, the same DNN is trained to function simultaneously as a NR filter. Our proposed SSS system is evaluated using an isolated word recognition task and a large vocabulary continuous speech recognition task when either nondirectional or directional noise is accumulated in the target speech. Our evaluation results demonstrate that DNN performs noticeably better than the baseline approach, especially when directional noise is accumulated with a low signal-to-noise ratio.
引用
收藏
页码:389 / 394
页数:6
相关论文
共 50 条
  • [1] SOUND SOURCE SEPARATION OF MOVING SPEAKERS FOR ROBOT AUDITION
    Nakadai, Kazuhiro
    Nakajima, Hirofumi
    Hasegawa, Yuji
    Tsujino, Hiroshi
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 3685 - 3688
  • [2] High performance sound source separation adaptable to environmental changes for robot audition
    Nakajima, Hirofumi
    Nakadai, Kazuhiro
    Hasegawa, Yuuji
    Tsujino, Hiroshi
    2008 IEEE/RSJ INTERNATIONAL CONFERENCE ON ROBOTS AND INTELLIGENT SYSTEMS, VOLS 1-3, CONFERENCE PROCEEDINGS, 2008, : 2165 - 2171
  • [3] Interactive Sound Source Localization using Robot Audition for Tablet Devices
    Nakamura, Keisuke
    Sinapayen, Lana
    Nakadai, Kazuhiro
    2015 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2015, : 6137 - 6142
  • [4] Blind source separation for robot audition using fixed HRTF beamforming
    Maazaoui, Mounira
    Abed-Meraim, Karim
    Grenier, Yves
    EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2012,
  • [5] Blind source separation for robot audition using fixed HRTF beamforming
    Mounira Maazaoui
    Karim Abed-Meraim
    Yves Grenier
    EURASIP Journal on Advances in Signal Processing, 2012
  • [6] Blind Source Separation for Robot Audition using Fixed Beamforming with HRTFs
    Maazaoui, Mounira
    Grenier, Yves
    Abed-Meraim, Karim
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 3124 - +
  • [7] Improvement of robot audition by interfacing sound source separation and automatic speech recognition with missing feature theory
    Yamamoto, S
    Nakadai, K
    Tsujino, H
    Yokoyama, T
    Okuno, HG
    2004 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, VOLS 1- 5, PROCEEDINGS, 2004, : 1517 - 1523
  • [8] FREQUENCY DOMAIN BLIND SOURCE SEPARATION FOR ROBOT AUDITION USING A PARAMETERIZED SPARSITY CRITERION
    Maazaoui, Mounira
    Grenier, Yves
    Abed-Meraim, Karim
    19TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO-2011), 2011, : 1869 - 1873
  • [9] Improved sound source localization in horizontal plane for binaural robot audition
    Kim, Ui-Hyun
    Nakadai, Kazuhiro
    Okuno, Hiroshi G.
    APPLIED INTELLIGENCE, 2015, 42 (01) : 63 - 74
  • [10] Exploiting known sound source signals to improve ICA-based robot audition in speech separation and recognition
    Takeda, Ryu
    Nakadai, Kazuhiro
    Komatani, Kazunori
    Ogata, Tetsuya
    Okuno, Hiroshi G.
    2007 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, VOLS 1-9, 2007, : 1763 - +