Sound Source Separation for Robot Audition using Deep Learning

被引:0
|
作者
Noda, Kuniaki [1 ]
Hashimoto, Naoya [1 ]
Nakadai, Kazuhiro [2 ]
Ogata, Tetsuya [1 ]
机构
[1] Waseda Univ, Grad Sch Fundamental Sci & Engn, Tokyo 1698555, Japan
[2] Honda Res Inst Japan Co Ltd, Saitama 3510114, Japan
关键词
D O I
暂无
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
Noise robust speech recognition is crucial for effective human-machine interaction in real-world environments. Sound source separation (SSS) is one of the most widely used approaches for addressing noise robust speech recognition by extracting a target speaker's speech signal while suppressing simultaneous unintended signals. However, conventional SSS algorithms, such as independent component analysis or nonlinear principal component analysis, are limited in modeling complex projections with scalability. Moreover, conventional systems required designing an independent subsystem for noise reduction (NR) in addition to the SSS. To overcome these issues, we propose a deep neural network (DNN) framework for modeling the separation function (SF) of an SSS system. By training a DNN to predict clean sound features of a target sound from corresponding multichannel deteriorated sound feature inputs, we enable the DNN to model the SF for extracting the target sound without prior knowledge regarding the acoustic properties of the surrounding environment. Moreover, the same DNN is trained to function simultaneously as a NR filter. Our proposed SSS system is evaluated using an isolated word recognition task and a large vocabulary continuous speech recognition task when either nondirectional or directional noise is accumulated in the target speech. Our evaluation results demonstrate that DNN performs noticeably better than the baseline approach, especially when directional noise is accumulated with a low signal-to-noise ratio.
引用
收藏
页码:389 / 394
页数:6
相关论文
共 50 条
  • [41] Localizing Bird Songs Using an Open Source Robot Audition System with a Microphone Array
    Suzuki, Reiji
    Matsubayashi, Shiho
    Nakadai, Kazuhiro
    Okuno, Hiroshi G.
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2626 - 2630
  • [42] Blind Source Separation of Radar Signals in Time Domain Using Deep Learning
    Hinderer, Sven
    2022 23RD INTERNATIONAL RADAR SYMPOSIUM (IRS), 2022, : 486 - 491
  • [43] The Cocktail Party Robot: Sound Source Separation and Localisation with an Active Binaural Head
    Deleforge, Antoine
    Horaud, Radu
    HRI'12: PROCEEDINGS OF THE SEVENTH ANNUAL ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION, 2012, : 431 - 438
  • [44] Real time robot audition system incorporating both 3D sound source localisation and voice characterisation
    Rudzyn, Ben
    Kadous, Waleed
    Sammut, Claude
    PROCEEDINGS OF THE 2007 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, VOLS 1-10, 2007, : 4733 - +
  • [45] SSLIDE: SOUND SOURCE LOCALIZATION FOR INDOORS BASED ON DEEP LEARNING
    Wu, Yifan
    Ayyalasomayajula, Roshan
    Bianco, Michael J.
    Bharadia, Dinesh
    Gerstoft, Peter
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 4680 - 4684
  • [46] Underwater Sound Source Range Estimation Based on Deep Learning
    Qu, Yuchen
    Huang, Yiqian
    Ren, Xinmin
    Chen, Yang
    Han, Tianshun
    OCEANS 2024 - SINGAPORE, 2024,
  • [47] Phased microphone array for sound source localization with deep learning
    Ma W.
    Liu X.
    Aerospace Systems, 2019, 2 (2) : 71 - 81
  • [48] Speech Separation Using Deep Learning
    Nandal, P.
    SUSTAINABLE COMMUNICATION NETWORKS AND APPLICATION, ICSCN 2019, 2020, 39 : 319 - 326
  • [49] Persian Music Source Separation in Audio-Visual Data Using Deep Learning
    Hashemi, Seyedeh Sogand
    Aghabozorgi, Masoudreza
    Sadeghi, Mohammad Taghi
    2020 6TH IRANIAN CONFERENCE ON SIGNAL PROCESSING AND INTELLIGENT SYSTEMS (ICSPIS), 2020,
  • [50] Localizing an Intermittent and Moving Sound Source Using a Mobile Robot
    Nguyen, Quan V.
    Colas, Francis
    Vincent, Emmanuel
    Charpillet, Francois
    2016 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS 2016), 2016, : 1986 - 1991