Vector-quantized Variational Autoencoder for Phase-aware Speech Enhancement

被引:2
|
作者
Tuan Vu Ho [1 ]
Quoc Huy Nguyen [1 ]
Akagi, Masato [1 ]
Unoki, Masashi [1 ]
机构
[1] Japan Adv Inst Sci & Technol, Nomi, Japan
来源
关键词
Speech enhancement; vector-quantized variational autoencoder; complex Wiener filter; noise reduction; NOISE;
D O I
10.21437/Interspeech.2022-443
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech-enhancement methods based on the complex ideal ratio mask (cIRM) have achieved promising results. These methods often deploy a deep neural network to jointly estimate the real and imaginary components of the cIRM defined in the complex domain. However, the unbounded property of the cIRM poses difficulties when it comes to effectively training a neural network. To alleviate this problem, this paper proposes a phase-aware speech-enhancement method through estimating the magnitude and phase of a complex adaptive Wiener filter. With this method, a noise-robust vector-quantized variational autoencoder is used for estimating the magnitude of the Wiener filter by using the Itakura-Saito divergence on the time-frequency domain, while the phase of the Wiener filter is estimated using a convolutional recurrent network using the scale-invariant signal-to-noise-ratio constraint in the time domain. The proposed method was evaluated on the open Voice Bank+DEMAND dataset to provide a direct comparison with other speech-enhancement methods and achieved a Perceptual Evaluation of Speech Quality score of 2.85 and ShortTime Objective Intelligibility score of 0.94, which is better than the state-of-art method based on cIRM estimation during the 2020 Deep Noise Challenge.
引用
收藏
页码:176 / 180
页数:5
相关论文
共 50 条
  • [41] Phase-Aware Single-Channel Speech Enhancement With Modulation-Domain Kalman Filtering
    Dionelis, Nikolaos
    Brookes, Mike
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (05) : 937 - 950
  • [42] Adaptive Neural Speech Enhancement with a Denoising Variational Autoencoder
    Bando, Yoshiaki
    Sekiguchi, Kouhei
    Yoshii, Kazuyoshi
    INTERSPEECH 2020, 2020, : 2437 - 2441
  • [43] GUIDED VARIATIONAL AUTOENCODER FOR SPEECH ENHANCEMENT WITH A SUPERVISED CLASSIFIER
    Carbajal, Guillaume
    Richter, Julius
    Gerkmann, Timo
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 681 - 685
  • [44] The Multilayer Perceptron Vector Quantized Variational AutoEncoder for Spectral Envelope Quantization
    Srikotr, Tanasan
    Mano, Kazunori
    2020 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS (ICCE), 2020, : 348 - 353
  • [45] Phase-aware subspace decomposition for single channel speech separation
    Wiem, Belhedi
    Mohamed Anouar, Ben Messaoud
    Aicha, Bouzid
    IET SIGNAL PROCESSING, 2020, 14 (04) : 214 - 222
  • [46] VECTOR-QUANTIZED TRANSFORM CODER FOR SPEECH CODING AT 9.6KBIT/S AND BELOW
    KONDOZ, A
    EVANS, BG
    ELECTRONICS LETTERS, 1987, 23 (24) : 1286 - 1288
  • [47] Speaker Adaptive Text-to-Speech With Timbre-Normalized Vector-Quantized Feature
    Du, Chenpeng
    Guo, Yiwei
    Chen, Xie
    Yu, Kai
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 3446 - 3456
  • [48] Vector-Quantized Variational Teacher and Multimodal Collaborative Student for Crack Segmentation via Knowledge Distillation
    Qiu, Shi
    Zaheer, Qasim
    Shah, S. Muhammad Ahmed Hassan
    Ai, Chengbo
    Wang, Jin
    Zhan, You
    JOURNAL OF COMPUTING IN CIVIL ENGINEERING, 2025, 39 (03)
  • [49] Maximum a posteriori estimation of spectral gain with harmonic-structure-based phase reconstruction for phase-aware speech enhancement
    Wakabayashi, Yukoh
    Ono, Nobutaka
    2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 1649 - 1652
  • [50] Sub-band Vector Quantized Variational AutoEncoder for Spectral Envelope Quantization
    Srikotr, Tanasan
    Mano, Kazunori
    PROCEEDINGS OF THE 2019 IEEE REGION 10 CONFERENCE (TENCON 2019): TECHNOLOGY, KNOWLEDGE, AND SOCIETY, 2019, : 296 - 300