Underdetermined Source Separation Based on Generalized Multichannel Variational Autoencoder

被引：18

作者：

Seki, Shogo ^{[1
]}

Kameoka, Hirokazu ^{[2
]}

Li, Li ^{[3
]}

Toda, Tomoki ^{[4
]}

Takeda, Kazuya ^{[5
]}

机构：

[1] Nagoya Univ, Grad Sch Informat, Nagoya, Aichi 4640861, Japan

[2] NTT Corp, Atsugi, Kanagawa 2430198, Japan

[3] Univ Tsukuba, Grad Sch Syst & Informat Engn, Tsukuba, Ibaraki 3058573, Japan

[4] Nagoya Univ, Informat Technol Ctr, Nagoya, Aichi 4640861, Japan

[5] Nagoya Univ, Inst Innovat Future Soc, Nagoya, Aichi 4648603, Japan

来源：

IEEE ACCESS | 2019年 / 7卷

关键词：

Underdetermined source separation; variational audoencoder; non-negative matrix factorization; AUDIO SOURCE SEPARATION; NONNEGATIVE MATRIX FACTORIZATION;

D O I：

10.1109/ACCESS.2019.2954120

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper deals with a multichannel audio source separation problem under underdetermined conditions. Multichannel non-negative matrix factorization (MNMF) is a powerful method for underdetermined audio source separation, which adopts the NMF concept to model and estimate the power spectrograms of the sound sources in a mixture signal. This concept is also used in independent low-rank matrix analysis (ILRMA), a special class of the MNMF formulated under determined conditions. While these methods work reasonably well for particular types of sound sources, one limitation is that they can fail to work for sources with spectrograms that do not comply with the NMF model. To address this limitation, an extension of ILRMA called the multichannel variational autoencoder (MVAE) method was recently proposed, where a conditional VAE (CVAE) is used instead of the NMF model for expressing source power spectrograms. This approach has performed impressively in determined source separation tasks thanks to the representation power of deep neural networks. While the original MVAE method was formulated under determined mixing conditions, this paper proposes a generalized version of it by combining the ideas of MNMF and MVAE so that it can also deal with underdetermined cases. We call this method the generalized MVAE (GMVAE) method. In underdetermined source separation and speech enhancement experiments, the proposed method performed better than baseline methods.

引用

页码：168104 / 168115

页数：12

共 50 条

[1] Generalized Multichannel Variational Autoencoder for Underdetermined Source Separation
Seki, Shogo
Kameoka, Hirokazu
Li, Li
Toda, Tomoki
Takeda, Kazuya
2019 27TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2019,
[2] INVESTIGATION AND COMPARISON OF OPTIMIZATION METHODS FOR VARIATIONAL AUTOENCODER-BASED UNDERDETERMINED MULTICHANNEL SOURCE SEPARATION
Seki, Shogo
Kameoka, Hirokazu
Li, Li
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 511 - 515
[3] Supervised Determined Source Separation with Multichannel Variational Autoencoder
Kameoka, Hirokazu
Li, Li
Inoue, Shota
Makino, Shoji
NEURAL COMPUTATION, 2019, 31 (09) : 1891 - 1914
[4] Multichannel Variational Autoencoder-Based Speech Separation in Designated Speaker Order
Liao, Lele
Cheng, Guoliang
Ruan, Haoxin
Chen, Kai
Lu, Jing
SYMMETRY-BASEL, 2022, 14 (12):
[5] JOINT SEPARATION AND DEREVERBERATION OF REVERBERANT MIXTURES WITH MULTICHANNEL VARIATIONAL AUTOENCODER
Inoue, Shota
Kameoka, Hirokazu
Li, Li
Seki, Shogo
Makino, Shoji
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 96 - 100
[6] UNSUPERVISED SPATIAL DICTIONARY LEARNING FOR SPARSE UNDERDETERMINED MULTICHANNEL SOURCE SEPARATION
Nesta, Francesco
Fakhry, Mahmoud
2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 86 - 90
[7] Source Separation in Joint Communication and Radar Systems Based on Unsupervised Variational Autoencoder
Alaghbari, Khaled A.
Lim, Heng Siong
Jin, Benzhou
Shen, Yutong
IEEE OPEN JOURNAL OF VEHICULAR TECHNOLOGY, 2024, 5 : 56 - 70
[8] On underdetermined source separation
Taleb, A
Jutten, C
ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, : 1445 - 1448
[9] SVM based underdetermined blind source separation
Li, Rong-Hua
Yang, Zu-Yuan
Zhao, Min
Xie, Sheng-Li
Dianzi Yu Xinxi Xuebao/Journal of Electronics and Information Technology, 2009, 31 (02): : 319 - 322
[10] Speech Source Separation Using Variational Autoencoder and Bandpass Filter
Do, Hao Duc
Tran, Son Thai
Chau, Duc Thanh
IEEE ACCESS, 2020, 8 : 156219 - 156231

← 1 2 3 4 5 →