A Study on Speech Enhancement Based on Diffusion Probabilistic Model

被引:0
|
作者
Lu, Yen-Ju [1 ]
Tsao, Yu [1 ]
Watanabe, Shinji [2 ]
机构
[1] Acad Sinica, Res Ctr Informat Technol Innovat, Taipei, Taiwan
[2] Carnegie Mellon Univ, Language Technol Inst, Pittsburgh, PA 15213 USA
关键词
TO-VECTOR REGRESSION; NOISE; INTELLIGIBILITY;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Diffusion probabilistic models have demonstrated an outstanding capability to model natural images and raw audio waveforms through a paired diffusion and reverse processes. The unique property of the reverse process (namely, eliminating non-target signals from the Gaussian noise and noisy signals) could be utilized to restore clean signals. Based on this property, we propose a diffusion probabilistic model-based speech enhancement (DiffuSE) model that aims to recover clean speech signals from noisy signals. The fundamental architecture of the proposed DiffuSE model is similar to that of DiffWave-a high-quality audio waveform generation model that has a relatively low computational cost and footprint. To attain better enhancement performance, we designed an advanced reverse process, termed the supportive reverse process, which adds noisy speech in each time-step to the predicted speech. The experimental results show that DiffuSE yields performance that is comparable to related audio generative models on the standardized Voice Bank corpus SE task. Moreover, relative to the generally suggested full sampling schedule, the proposed supportive reverse process especially improved the fast sampling, taking few steps to yield better enhancement results over the conventional full step inference process.
引用
收藏
页码:659 / 666
页数:8
相关论文
共 50 条
  • [31] Diffiner: A Versatile Diffusion-based Generative Refiner for Speech Enhancement
    Sawata, Ryosuke
    Murata, Naoki
    Takida, Yuhta
    Uesaka, Toshimitsu
    Shibuya, Takashi
    Takahashi, Shusuke
    Mitsufuji, Yuki
    INTERSPEECH 2023, 2023, : 3824 - 3828
  • [32] Variance-Preserving-Based Interpolation Diffusion Models for Speech Enhancement
    Guo, Zilu
    Du, Jun
    Lee, Chin-Hui
    Gao, Yu
    Zhang, Wenbin
    INTERSPEECH 2023, 2023, : 1065 - 1069
  • [33] Model-based eigenspectrum estimation for speech enhancement
    Bhunjun, Vinesh
    Brookes, Mike
    Naylor, Patrick
    2006 FORTIETH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS, VOLS 1-5, 2006, : 1331 - +
  • [34] STATISTICAL-MODEL-BASED SPEECH ENHANCEMENT SYSTEMS
    EPHRAIM, Y
    PROCEEDINGS OF THE IEEE, 1992, 80 (10) : 1526 - 1555
  • [35] Model-Based Speech Enhancement in the Modulation Domain
    Wang, Yu
    Brookes, Mike
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (03) : 580 - 594
  • [36] ON THE INFLUENCE OF INHARMONICITIES IN MODEL-BASED SPEECH ENHANCEMENT
    Norholm, Sidsel Marie
    Jensen, Jesper Rindom
    Christensen, Mads Graesboll
    2013 PROCEEDINGS OF THE 21ST EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2013,
  • [37] NOISE IDENTIFICATION FOR MODEL-BASED SPEECH ENHANCEMENT
    Jiang Wenbin
    Ying Rendong
    Liu Peilin
    2014 12TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP), 2014, : 478 - 483
  • [38] Model-Based Speech Enhancement for Automotive Applications
    Krini, Mohamed
    Schmidt, Gerhard
    2009 PROCEEDINGS OF 6TH INTERNATIONAL SYMPOSIUM ON IMAGE AND SIGNAL PROCESSING AND ANALYSIS (ISPA 2009), 2009, : 638 - 643
  • [39] MODEL BASED BINAURAL ENHANCEMENT OF VOICED AND UNVOICED SPEECH
    Kavalekalam, Mathew Shaji
    Christensen, Mads Graesboll
    Boldt, Jesper B.
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 666 - 670
  • [40] Speech enhancement based on AR model parameters estimation
    Deng, Feng
    Bao, Changchun
    SPEECH COMMUNICATION, 2016, 79 : 30 - 46