Sound Source Separation for Robot Audition using Deep Learning

被引:0
|
作者
Noda, Kuniaki [1 ]
Hashimoto, Naoya [1 ]
Nakadai, Kazuhiro [2 ]
Ogata, Tetsuya [1 ]
机构
[1] Waseda Univ, Grad Sch Fundamental Sci & Engn, Tokyo 1698555, Japan
[2] Honda Res Inst Japan Co Ltd, Saitama 3510114, Japan
关键词
D O I
暂无
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
Noise robust speech recognition is crucial for effective human-machine interaction in real-world environments. Sound source separation (SSS) is one of the most widely used approaches for addressing noise robust speech recognition by extracting a target speaker's speech signal while suppressing simultaneous unintended signals. However, conventional SSS algorithms, such as independent component analysis or nonlinear principal component analysis, are limited in modeling complex projections with scalability. Moreover, conventional systems required designing an independent subsystem for noise reduction (NR) in addition to the SSS. To overcome these issues, we propose a deep neural network (DNN) framework for modeling the separation function (SF) of an SSS system. By training a DNN to predict clean sound features of a target sound from corresponding multichannel deteriorated sound feature inputs, we enable the DNN to model the SF for extracting the target sound without prior knowledge regarding the acoustic properties of the surrounding environment. Moreover, the same DNN is trained to function simultaneously as a NR filter. Our proposed SSS system is evaluated using an isolated word recognition task and a large vocabulary continuous speech recognition task when either nondirectional or directional noise is accumulated in the target speech. Our evaluation results demonstrate that DNN performs noticeably better than the baseline approach, especially when directional noise is accumulated with a low signal-to-noise ratio.
引用
收藏
页码:389 / 394
页数:6
相关论文
共 50 条
  • [31] Using vision to improve sound source separation
    Nakagawa, Y
    Okuno, HG
    Kitano, H
    SIXTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-99)/ELEVENTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE (IAAI-99), 1999, : 768 - 775
  • [32] Sound Source Separation Using Neural Network
    Sose, Shreya
    Mali, Swapnil
    Mahajan, S. P.
    2019 10TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND NETWORKING TECHNOLOGIES (ICCCNT), 2019,
  • [33] Integration of Sound Source Localization and Separation to Improve Dialogue Management on a Robot
    Frechette, Maxime
    Letourneau, Dominic
    Valin, Jean-Marc
    Michaud, Francois
    2012 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2012, : 2358 - 2363
  • [34] Lifelong Learning of Acoustic Events for Robot Audition
    Bayram, Baris
    Ince, Gokhan
    2023 IEEE/SICE INTERNATIONAL SYMPOSIUM ON SYSTEM INTEGRATION, SII, 2023,
  • [35] Blind Source Separation With Parameter-Free Adaptive Step-Size Method for Robot Audition
    Nakajima, Hirofumi
    Nakadai, Kazuhiro
    Hasegawa, Yuji
    Tsujino, Hiroshi
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (06): : 1476 - 1485
  • [36] Uncertainty Estimation for Sound Source Localization With Deep Learning
    Pi, Rendong
    Yu, Xiang
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2025, 74
  • [37] Blind sound scene decomposition for robot audition using SIMO-model-based ICA
    Takatani, T
    Ukai, S
    Nishikawa, T
    Saruwatari, H
    Shikano, K
    2005 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, VOLS 1-4, 2005, : 215 - 220
  • [38] A survey of sound source localization with deep learning methods
    Grumiaux, Pierre-Amaury
    Kitic, Srdan
    Girin, Laurent
    Guerin, Alexandre
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2022, 152 (01): : 107 - 151
  • [39] NeoSSNet: Real-Time Neonatal Chest Sound Separation Using Deep Learning
    Poh, Yang Yi
    Grooby, Ethan
    Tan, Kenneth
    Zhou, Lindsay
    King, Arrabella
    Ramanathan, Ashwin
    Malhotra, Atul
    Harandi, Mehrtash
    Marzbanrad, Faezeh
    IEEE OPEN JOURNAL OF ENGINEERING IN MEDICINE AND BIOLOGY, 2024, 5 : 345 - 352
  • [40] Placement Planning for Sound Source Tracking in Active Drone Audition
    Yamada, Taiki
    Itoyama, Katsutoshi
    Nishida, Kenji
    Nakadai, Kazuhiro
    DRONES, 2023, 7 (07)