Semi-Supervised Source Localization in Reverberant Environments with Deep Generative Modeling

被引:0
|
作者
Bianco, Michael J. [1 ]
Gannot, Sharon [2 ]
Fernandez-Grande, Efren [3 ]
Gerstoft, Peter [1 ]
机构
[1] Marine Physical Laboratory, University of California San Diego, San Diego,CA,92093, United States
[2] Faculty of Engineering, Bar-Ilan University, Ramat-Gan,5290002, Israel
[3] Department of Electrical Engineering, Technical University of Denmark, Kongens Lyngby,2800, Denmark
基金
欧盟地平线“2020”;
关键词
Multiple signal classification - Learning systems - Music - Signal processing - Reverberation - Supervised learning - Computer music;
D O I
暂无
中图分类号
学科分类号
摘要
Localization in reverberant environments remains an open challenge. Recently, supervised learning approaches have demonstrated very promising results in addressing reverberation. However, even with large data volumes, the number of labels available for supervised learning in such environments is usually small. We propose to address this issue with a semi-supervised learning (SSL) approach, based on deep generative modeling. Our chosen deep generative model, the variational autoencoder (VAE), is trained to generate the phase of relative transfer functions (RTFs) between microphones. In parallel, a direction of arrival (DOA) classifier network based on RTF-phase is also trained. The joint generative and discriminative model, deemed VAE-SSL, is trained using labeled and unlabeled RTF-phase sequences. In learning to generate and classify the sequences, the VAE-SSL extracts the physical causes of the RTF-phase (i.e., source location) from distracting signal characteristics such as noise and speech activity. This facilitates effective end-to-end operation of the VAE-SSL, which requires minimal preprocessing of RTF-phase. VAE-SSL is compared with two signal processing-based approaches, steered response power with phase transform (SRP-PHAT) and MUltiple SIgnal Classification (MUSIC), as well as fully supervised CNNs. The approaches are compared using data from two real acoustic environments - one of which was recently obtained at Technical University of Denmark specifically for our study. We find that VAE-SSL can outperform the conventional approaches and the CNN in label-limited scenarios. Further, the trained VAE-SSL system can generate new RTF-phase samples which capture the physics of the acoustic environment. Thus, the generative modeling in VAE-SSL provides a means of interpreting the learned representations. To the best of our knowledge, this paper presents the first approach to modeling the physics of acoustic propagation using deep generative modeling. © 2013 IEEE.
引用
收藏
页码:84956 / 84970
相关论文
共 50 条
  • [1] Semi-Supervised Source Localization in Reverberant Environments With Deep Generative Modeling
    Bianco, Michael J.
    Gannot, Sharon
    Fernandez-Grande, Efren
    Gerstoft, Peter
    IEEE ACCESS, 2021, 9 : 84956 - 84970
  • [2] SEMI-SUPERVISED SOURCE LOCALIZATION WITH DEEP GENERATIVE MODELING
    Bianco, Michael J.
    Gannot, Sharon
    Gerstoft, Peter
    PROCEEDINGS OF THE 2020 IEEE 30TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2020,
  • [3] Semi-Supervised Multiple Source Localization Using Relative Harmonic Coefficients Under Noisy and Reverberant Environments
    Hu, Yonggang
    Samarasinghe, Prasanga N.
    Gannot, Sharon
    Abhayapala, Thushara D.
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 (28) : 3108 - 3123
  • [4] Semi-supervised Learning with Deep Generative Models
    Kingma, Diederik P.
    Rezende, Danilo J.
    Mohamed, Shakir
    Welling, Max
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 27 (NIPS 2014), 2014, 27
  • [5] Semi-Supervised Learning for Deep Causal Generative Models
    Ibrahim, Yasin
    Warr, Hermione
    Kamnitsas, Konstantinos
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT XII, 2024, 15012 : 294 - 303
  • [6] Source localization in reverberant environments: Modeling and statistical analysis
    Gustafsson, T
    Rao, BD
    Trivedi, M
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2003, 11 (06): : 791 - 803
  • [7] Emotional Voice Conversion with Semi-Supervised Generative Modeling
    Zhu, Hai
    Zhan, Huayi
    Cheng, Hong
    Wu, Ying
    INTERSPEECH 2023, 2023, : 2278 - 2282
  • [8] Semi-Supervised Analysis of the Electrocardiogram Using Deep Generative Models
    Rasmussen, Soren M.
    Jensen, Malte E. K.
    Meyhoff, Christian S.
    Aasvang, Eske K.
    Sorensen, Helge B. D.
    2021 43RD ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE & BIOLOGY SOCIETY (EMBC), 2021, : 1124 - 1127
  • [9] Learning Disentangled Representations with Semi-Supervised Deep Generative Models
    Siddharth, N.
    Paige, Brooks
    van de Meent, Jan-Willem
    Desmaison, Alban
    Goodman, Noah D.
    Kohli, Pushmeet
    Wood, Frank
    Torr, Philip H. S.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [10] A Semi-supervised Deep Generative Model for Human Body Analysis
    de Bem, Rodrigo
    Ghosh, Arnab
    Ajanthan, Thalaiyasingam
    Miksik, Ondrej
    Siddharth, N.
    Torr, Philip
    COMPUTER VISION - ECCV 2018 WORKSHOPS, PT II, 2019, 11130 : 500 - 517