Semi-Supervised Source Localization in Reverberant Environments with Deep Generative Modeling

被引：0

作者：

Bianco, Michael J. ^{[1
]}

Gannot, Sharon ^{[2
]}

Fernandez-Grande, Efren ^{[3
]}

Gerstoft, Peter ^{[1
]}

机构：

[1] Marine Physical Laboratory, University of California San Diego, San Diego,CA,92093, United States

[2] Faculty of Engineering, Bar-Ilan University, Ramat-Gan,5290002, Israel

[3] Department of Electrical Engineering, Technical University of Denmark, Kongens Lyngby,2800, Denmark

来源：

IEEE Access | 2021年 / 9卷

基金：

欧盟地平线“2020”;

关键词：

Multiple signal classification - Learning systems - Music - Signal processing - Reverberation - Supervised learning - Computer music;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Localization in reverberant environments remains an open challenge. Recently, supervised learning approaches have demonstrated very promising results in addressing reverberation. However, even with large data volumes, the number of labels available for supervised learning in such environments is usually small. We propose to address this issue with a semi-supervised learning (SSL) approach, based on deep generative modeling. Our chosen deep generative model, the variational autoencoder (VAE), is trained to generate the phase of relative transfer functions (RTFs) between microphones. In parallel, a direction of arrival (DOA) classifier network based on RTF-phase is also trained. The joint generative and discriminative model, deemed VAE-SSL, is trained using labeled and unlabeled RTF-phase sequences. In learning to generate and classify the sequences, the VAE-SSL extracts the physical causes of the RTF-phase (i.e., source location) from distracting signal characteristics such as noise and speech activity. This facilitates effective end-to-end operation of the VAE-SSL, which requires minimal preprocessing of RTF-phase. VAE-SSL is compared with two signal processing-based approaches, steered response power with phase transform (SRP-PHAT) and MUltiple SIgnal Classification (MUSIC), as well as fully supervised CNNs. The approaches are compared using data from two real acoustic environments - one of which was recently obtained at Technical University of Denmark specifically for our study. We find that VAE-SSL can outperform the conventional approaches and the CNN in label-limited scenarios. Further, the trained VAE-SSL system can generate new RTF-phase samples which capture the physics of the acoustic environment. Thus, the generative modeling in VAE-SSL provides a means of interpreting the learned representations. To the best of our knowledge, this paper presents the first approach to modeling the physics of acoustic propagation using deep generative modeling. © 2013 IEEE.

引用

页码：84956 / 84970

共 50 条

[1] Semi-Supervised Source Localization in Reverberant Environments With Deep Generative Modeling
Bianco, Michael J.
Gannot, Sharon
Fernandez-Grande, Efren
Gerstoft, Peter
IEEE ACCESS, 2021, 9 : 84956 - 84970
[2] SEMI-SUPERVISED SOURCE LOCALIZATION WITH DEEP GENERATIVE MODELING
Bianco, Michael J.
Gannot, Sharon
Gerstoft, Peter
PROCEEDINGS OF THE 2020 IEEE 30TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2020,
[3] Semi-Supervised Multiple Source Localization Using Relative Harmonic Coefficients Under Noisy and Reverberant Environments
Hu, Yonggang
Samarasinghe, Prasanga N.
Gannot, Sharon
Abhayapala, Thushara D.
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 (28) : 3108 - 3123
[4] Semi-supervised Learning with Deep Generative Models
Kingma, Diederik P.
Rezende, Danilo J.
Mohamed, Shakir
Welling, Max
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 27 (NIPS 2014), 2014, 27
[5] Semi-Supervised Learning for Deep Causal Generative Models
Ibrahim, Yasin
Warr, Hermione
Kamnitsas, Konstantinos
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT XII, 2024, 15012 : 294 - 303
[6] Source localization in reverberant environments: Modeling and statistical analysis
Gustafsson, T
Rao, BD
Trivedi, M
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2003, 11 (06): : 791 - 803
[7] Emotional Voice Conversion with Semi-Supervised Generative Modeling
Zhu, Hai
Zhan, Huayi
Cheng, Hong
Wu, Ying
INTERSPEECH 2023, 2023, : 2278 - 2282
[8] Semi-Supervised Analysis of the Electrocardiogram Using Deep Generative Models
Rasmussen, Soren M.
Jensen, Malte E. K.
Meyhoff, Christian S.
Aasvang, Eske K.
Sorensen, Helge B. D.
2021 43RD ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE & BIOLOGY SOCIETY (EMBC), 2021, : 1124 - 1127
[9] Learning Disentangled Representations with Semi-Supervised Deep Generative Models
Siddharth, N.
Paige, Brooks
van de Meent, Jan-Willem
Desmaison, Alban
Goodman, Noah D.
Kohli, Pushmeet
Wood, Frank
Torr, Philip H. S.
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
[10] A Semi-supervised Deep Generative Model for Human Body Analysis
de Bem, Rodrigo
Ghosh, Arnab
Ajanthan, Thalaiyasingam
Miksik, Ondrej
Siddharth, N.
Torr, Philip
COMPUTER VISION - ECCV 2018 WORKSHOPS, PT II, 2019, 11130 : 500 - 517

← 1 2 3 4 5 →