Multichannel Linear Prediction-Based Speech Dereverberation Considering Sparse and Low-Rank Priors

被引:1
|
作者
Wang, Taihui [1 ,2 ]
Yang, Feiran [3 ]
Yang, Jun [1 ,2 ]
机构
[1] Chinese Acad Sci, Key Lab Noise & Vibrat Res, Inst Acoust, Beijing 100190, Peoples R China
[2] Univ Chinese Acad Sci, Beijing 100049, Peoples R China
[3] Chinese Acad Sci, State Key Lab Acoust, Inst Acoust, Beijing 100190, Peoples R China
基金
中国国家自然科学基金;
关键词
Speech processing; Reverberation; Microphones; Time-frequency analysis; Shape; Indexes; Cost function; Speech dereverberation; multichannel linear prediction; weighted prediction error; complex generalized Gaussian; nonnegative matrix factorization; TIME; REVERBERATION; MASKING; DOMAIN; SYSTEM; NOISE;
D O I
10.1109/TASLP.2024.3369535
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This article addresses the multi-channel linear prediction (MCLP)-based speech dereverberation problem by jointly considering the sparsity and low-rank priors of speech spectrograms. We utilize the complex generalized Gaussian (CGG) distribution as the source model and the generalized nonnegative matrix factorization (NMF) as the spectral model. The difference between the presented model and existing ones for MCLP is twofold. First, we adopt the CGG distribution with a time-frequency-variant scale parameter instead of that with a time-frequency-invariant scale parameter. Second, the time-frequency-varying scale parameter is approximated by NMF in a low-rank manner. Based on the maximum-likelihood criterion, speech dereverberation is formulated as an optimization problem that minimizes the prediction error weighted by the reciprocal of sparse and low-rank parameters. A convergence-guaranteed algorithm is derived to estimate the parameters using the majorization-minimization technology. The WPE, NMF-based WPE and CGG-based WPE can be treated as special cases of the proposed method with different shape and domain parameters. As a byproduct, the proposed method provides a simple and elegant way to derive the CGG-based WPE algorithm. A series of experiments show the superiority of the proposed method over WPE, NMF-based WPE and CGG-based WPE methods.
引用
收藏
页码:1724 / 1735
页数:12
相关论文
共 50 条
  • [31] A Signal Subspace Speech Enhancement Approach Based on Joint Low-Rank and Sparse Matrix Decomposition
    Sun, Chengli
    Xie, Jianxiao
    Leng, Yan
    ARCHIVES OF ACOUSTICS, 2016, 41 (02) : 245 - 254
  • [32] Speech Denoising in White Noise Based on Signal Subspace Low-rank Plus Sparse Decomposition
    Yuan, Shuai
    Sun, Cheng-li
    2017 INTERNATIONAL CONFERENCE ON ELECTRONIC INFORMATION TECHNOLOGY AND COMPUTER ENGINEERING (EITCE 2017), 2017, 128
  • [33] Linear Prediction-based Parallel WaveGAN Speech Synthesis
    Hwang, Min-Jae
    Yoon, Hyun-Wook
    Song, Chan-Ho
    Kim, Jin-Seob
    Kim, Jae-Min
    Song, Eunwoo
    2022 INTERNATIONAL CONFERENCE ON ELECTRONICS, INFORMATION, AND COMMUNICATION (ICEIC), 2022,
  • [34] REFERENCE MICROPHONE SELECTION AND LOW-RANK APPROXIMATION BASED MULTICHANNEL WIENER FILTER WITH APPLICATION TO SPEECH RECOGNITION
    Chen, Xing-yu
    Zhang, Jie
    Dai, Li-rong
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4963 - 4967
  • [35] Jointly Using Low-Rank and Sparsity Priors for Sparse Inverse Synthetic Aperture Radar Imaging
    Qiu, Wei
    Zhou, Jianxiong
    Fu, Qiang
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 100 - 115
  • [36] Towards a sparse low-rank regression model for memorability prediction of images
    Chu, Jinghui
    Gu, Huimin
    Su, Yuting
    Jing, Peiguang
    NEUROCOMPUTING, 2018, 321 : 357 - 368
  • [37] A Framework of Joint Low-Rank and Sparse Regression for Image Memorability Prediction
    Jing, Peiguang
    Su, Yuting
    Nie, Liqiang
    Gu, Huimin
    Liu, Jing
    Wang, Meng
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2019, 29 (05) : 1296 - 1309
  • [38] Linear Prediction-Based Online Dereverberation and Noise Reduction Using Alternating Kalman Filters
    Braun, Sebastian
    Habets, Emanuel A. P.
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (06) : 1115 - 1125
  • [39] A PERCEPTUALLY MOTIVATED APPROACH VIA SPARSE AND LOW-RANK MODEL FOR SPEECH ENHANCEMENT
    Min, Gang
    Zhang, Xiongwei
    Yang, Jibin
    Han, Wei
    Zou, Xia
    2016 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA & EXPO (ICME), 2016,
  • [40] DNN-Based Linear Prediction Residual Enhancement for Speech Dereverberation
    Feng, Xinyang
    Li, Nuo
    He, Zunwen
    Zhang, Yan
    Zhang, Wancheng
    2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 541 - 545