Sound Source Separation for Robot Audition using Deep Learning

被引：0

作者：

Noda, Kuniaki ^{[1
]}

Hashimoto, Naoya ^{[1
]}

Nakadai, Kazuhiro ^{[2
]}

Ogata, Tetsuya ^{[1
]}

机构：

[1] Waseda Univ, Grad Sch Fundamental Sci & Engn, Tokyo 1698555, Japan

[2] Honda Res Inst Japan Co Ltd, Saitama 3510114, Japan

来源：

2015 IEEE-RAS 15TH INTERNATIONAL CONFERENCE ON HUMANOID ROBOTS (HUMANOIDS) | 2015年

关键词：

D O I：

暂无

中图分类号：

TP24 [机器人技术];

学科分类号：

080202 ; 1405 ;

摘要：

Noise robust speech recognition is crucial for effective human-machine interaction in real-world environments. Sound source separation (SSS) is one of the most widely used approaches for addressing noise robust speech recognition by extracting a target speaker's speech signal while suppressing simultaneous unintended signals. However, conventional SSS algorithms, such as independent component analysis or nonlinear principal component analysis, are limited in modeling complex projections with scalability. Moreover, conventional systems required designing an independent subsystem for noise reduction (NR) in addition to the SSS. To overcome these issues, we propose a deep neural network (DNN) framework for modeling the separation function (SF) of an SSS system. By training a DNN to predict clean sound features of a target sound from corresponding multichannel deteriorated sound feature inputs, we enable the DNN to model the SF for extracting the target sound without prior knowledge regarding the acoustic properties of the surrounding environment. Moreover, the same DNN is trained to function simultaneously as a NR filter. Our proposed SSS system is evaluated using an isolated word recognition task and a large vocabulary continuous speech recognition task when either nondirectional or directional noise is accumulated in the target speech. Our evaluation results demonstrate that DNN performs noticeably better than the baseline approach, especially when directional noise is accumulated with a low signal-to-noise ratio.

引用

页码：389 / 394

页数：6

共 50 条

[1] SOUND SOURCE SEPARATION OF MOVING SPEAKERS FOR ROBOT AUDITION
Nakadai, Kazuhiro
Nakajima, Hirofumi
Hasegawa, Yuji
Tsujino, Hiroshi
2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 3685 - 3688
[2] High performance sound source separation adaptable to environmental changes for robot audition
Nakajima, Hirofumi
Nakadai, Kazuhiro
Hasegawa, Yuuji
Tsujino, Hiroshi
2008 IEEE/RSJ INTERNATIONAL CONFERENCE ON ROBOTS AND INTELLIGENT SYSTEMS, VOLS 1-3, CONFERENCE PROCEEDINGS, 2008, : 2165 - 2171
[3] Interactive Sound Source Localization using Robot Audition for Tablet Devices
Nakamura, Keisuke
Sinapayen, Lana
Nakadai, Kazuhiro
2015 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2015, : 6137 - 6142
[4] Blind source separation for robot audition using fixed HRTF beamforming
Maazaoui, Mounira
Abed-Meraim, Karim
Grenier, Yves
EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2012,
[5] Blind source separation for robot audition using fixed HRTF beamforming
Mounira Maazaoui
Karim Abed-Meraim
Yves Grenier
EURASIP Journal on Advances in Signal Processing, 2012
[6] Blind Source Separation for Robot Audition using Fixed Beamforming with HRTFs
Maazaoui, Mounira
Grenier, Yves
Abed-Meraim, Karim
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 3124 - +
[7] Improvement of robot audition by interfacing sound source separation and automatic speech recognition with missing feature theory
Yamamoto, S
Nakadai, K
Tsujino, H
Yokoyama, T
Okuno, HG
2004 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, VOLS 1- 5, PROCEEDINGS, 2004, : 1517 - 1523
[8] FREQUENCY DOMAIN BLIND SOURCE SEPARATION FOR ROBOT AUDITION USING A PARAMETERIZED SPARSITY CRITERION
Maazaoui, Mounira
Grenier, Yves
Abed-Meraim, Karim
19TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO-2011), 2011, : 1869 - 1873
[9] Improved sound source localization in horizontal plane for binaural robot audition
Kim, Ui-Hyun
Nakadai, Kazuhiro
Okuno, Hiroshi G.
APPLIED INTELLIGENCE, 2015, 42 (01) : 63 - 74
[10] Exploiting known sound source signals to improve ICA-based robot audition in speech separation and recognition
Takeda, Ryu
Nakadai, Kazuhiro
Komatani, Kazunori
Ogata, Tetsuya
Okuno, Hiroshi G.
2007 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, VOLS 1-9, 2007, : 1763 - +

← 1 2 3 4 5 →