Multichannel Loss Function for Supervised Speech Source Separation by Mask-based Beamforming

被引:4
|
作者
Masuyama, Yoshiki [1 ,2 ]
Togami, Masahito [2 ]
Komatsu, Tatsuya [2 ]
机构
[1] Waseda Univ, Dept Intermedia Art & Sci, Tokyo, Japan
[2] LINE Corpolat, Tokyo, Japan
来源
INTERSPEECH 2019 | 2019年
关键词
Speaker-independent multi-talker separation; neural beamformer; multichannel Italura-Saito divergence;
D O I
10.21437/Interspeech.2019-1289
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
In this paper, we propose two mask-based beamforming methods using a deep neural network (DNN) trained by multichannel loss functions. Beamforming technique using time-frequency (TF)-masks estimated by a DNN have been applied to many applications where TF-masks are used for estimating spatial covariance matrices. To train a DNN for mask-based beamforming, loss functions designed for monaural speech enhancement/separation have been employed. Although such a training criterion is simple, it does not directly correspond to the performance of mask-based beamforming. To overcome this problem, we use multichannel loss functions which evaluate the estimated spatial covariance matrices based on the multichannel Itakura-Saito divergence. DNNs trained by the multichannel loss functions can be applied to construct several beamformers. Experimental results confirmed their effectiveness and robustness to microphone configurations.
引用
收藏
页码:2708 / 2712
页数:5
相关论文
共 50 条
  • [41] Multichannel Speech Separation and Enhancement Using the Convolutive Transfer Function
    Li, Xiaofei
    Girin, Laurent
    Gannot, Sharon
    Horaud, Radu
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (03) : 645 - 659
  • [42] UNSUPERVISED TRAINING FOR DEEP SPEECH SOURCE SEPARATION WITH KULLBACK-LEIBLER DIVERGENCE BASED PROBABILISTIC LOSS FUNCTION
    Togami, Masahito
    Masuyama, Yoshiki
    Komatsu, Tatsuya
    Nakagome, Yu
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 56 - 60
  • [43] Teacher-Student Learning and Post-processing for Robust BiLSTM Mask-Based Acoustic Beamforming
    Liu, Zhaoyi
    Chen, Qiuyuan
    Hu, Han
    Tang, Haoyu
    Zou, Y. X.
    NEURAL INFORMATION PROCESSING (ICONIP 2019), PT III, 2019, 11955 : 522 - 533
  • [44] Beamforming-based convolutive source separation
    Baumann, W
    Kolossa, D
    Orglmeister, R
    2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS: SENSOR ARRAY & MULTICHANNEL SIGNAL PROCESSING AUDIO AND ELECTROACOUSTICS MULTIMEDIA SIGNAL PROCESSING, 2003, : 357 - 360
  • [45] DESIGNING MULTICHANNEL SOURCE SEPARATION BASED ON SINGLE-CHANNEL SOURCE SEPARATION
    Lopez, A. Ramirez
    Ono, N.
    Remes, U.
    Palomaki, K.
    Kurimo, M.
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 469 - 473
  • [46] Semi-supervised Multichannel Speech Separation Based on a Phone- and Speaker-Aware Deep Generative Model of Speech Spectrograms
    Du, Yicheng
    Sekiguchi, Kouhei
    Bando, Yoshiaki
    Nugraha, Aditya Arie
    Fontaine, Mathieu
    Yoshii, Kazuyoshi
    Kawahara, Tatsuya
    28TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2020), 2021, : 870 - 874
  • [47] Disentangled Image Attribute Editing in Latent Space via Mask-based Retention Loss
    Ohaga, Shunya
    Togo, Ren
    Ogawa, Takahiro
    Haseyama, Miki
    PROCEEDINGS OF THE 4TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA IN ASIA, MMASIA 2022, 2022,
  • [48] Time-Frequency Mask-based Speech Enhancement using Convolutional Generative Adversarial Network
    Shah, Neil
    Patil, Hemant A.
    Soni, Meet H.
    2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 1246 - 1251
  • [49] SUPERVISED MONAURAL SOURCE SEPARATION BASED ON AUTOENCODERS
    Osako, Keiichi
    Mitsufuji, Yuki
    Singh, Rita
    Raj, Bhiksha
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 11 - 15
  • [50] STMNet: Single-Temporal Mask-Based Network for Self-Supervised Hyperspectral Change Detection
    Zhou, Tianyuan
    Luo, Fulin
    Fu, Chuan
    Guo, Tan
    Wang, Xiaopan
    Du, Bo
    Gao, Xinbo
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2025, 63