A Study on Speech Enhancement Based on Diffusion Probabilistic Model

被引:0
|
作者
Lu, Yen-Ju [1 ]
Tsao, Yu [1 ]
Watanabe, Shinji [2 ]
机构
[1] Acad Sinica, Res Ctr Informat Technol Innovat, Taipei, Taiwan
[2] Carnegie Mellon Univ, Language Technol Inst, Pittsburgh, PA 15213 USA
关键词
TO-VECTOR REGRESSION; NOISE; INTELLIGIBILITY;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Diffusion probabilistic models have demonstrated an outstanding capability to model natural images and raw audio waveforms through a paired diffusion and reverse processes. The unique property of the reverse process (namely, eliminating non-target signals from the Gaussian noise and noisy signals) could be utilized to restore clean signals. Based on this property, we propose a diffusion probabilistic model-based speech enhancement (DiffuSE) model that aims to recover clean speech signals from noisy signals. The fundamental architecture of the proposed DiffuSE model is similar to that of DiffWave-a high-quality audio waveform generation model that has a relatively low computational cost and footprint. To attain better enhancement performance, we designed an advanced reverse process, termed the supportive reverse process, which adds noisy speech in each time-step to the predicted speech. The experimental results show that DiffuSE yields performance that is comparable to related audio generative models on the standardized Voice Bank corpus SE task. Moreover, relative to the generally suggested full sampling schedule, the proposed supportive reverse process especially improved the fast sampling, taking few steps to yield better enhancement results over the conventional full step inference process.
引用
收藏
页码:659 / 666
页数:8
相关论文
共 50 条
  • [41] Constrained Probabilistic Subspace Maps Applied to Speech Enhancement
    Kalgaonkar, Kaustubh
    Clements, Mark A.
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1919 - 1922
  • [42] Residual Fusion Probabilistic Knowledge Distillation for Speech Enhancement
    Cheng, Jiaming
    Liang, Ruiyu
    Zhou, Lin
    Zhao, Li
    Huang, Chengwei
    Schuller, Bjorn W.
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 2680 - 2691
  • [43] Spectral difference for statistical model-based speech enhancement in speech recognition
    Soojeong Lee
    Joon-Hyuk Chang
    Multimedia Tools and Applications, 2017, 76 : 24917 - 24929
  • [44] Speech enhancement based on the decomposition of speech into deterministic and stochastic components and psychoacoustic model
    Jo, Seokhwan
    Yoo, Chang D.
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 897 - +
  • [45] Spectral difference for statistical model-based speech enhancement in speech recognition
    Lee, Soojeong
    Chang, Joon-Hyuk
    MULTIMEDIA TOOLS AND APPLICATIONS, 2017, 76 (23) : 24917 - 24929
  • [46] A model distance maximizing framework for speech recognizer-based speech enhancement
    BabaAli, Bagher
    Sameti, Hossein
    Falk, Tiago H.
    AEU-INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATIONS, 2011, 65 (02) : 99 - 106
  • [47] A Speech Enhancement Algorithm Based on a Chi MRF Model of the Speech STFT Amplitudes
    Andrianakis, Yiannis
    White, Paul R.
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (08): : 1508 - 1517
  • [48] Breast Tumor Image Synthesis based on Diffusion Probabilistic Model
    Oh, Seok-Hwan
    Jung, Guil
    Kim, MyeongGee
    Kim, Young-Min
    Lee, Hyeon-Jik
    Kim, Sang-Yun
    Kwon, Hyuk-Sool
    Bae, Hyeon-Min
    2024 IEEE ULTRASONICS, FERROELECTRICS, AND FREQUENCY CONTROL JOINT SYMPOSIUM, UFFC-JS 2024, 2024,
  • [49] PET image denoising based on denoising diffusion probabilistic model
    Kuang Gong
    Keith Johnson
    Georges El Fakhri
    Quanzheng Li
    Tinsu Pan
    European Journal of Nuclear Medicine and Molecular Imaging, 2024, 51 : 358 - 368
  • [50] PET image denoising based on denoising diffusion probabilistic model
    Gong, Kuang
    Johnson, Keith
    El Fakhri, Georges
    Li, Quanzheng
    Pan, Tinsu
    EUROPEAN JOURNAL OF NUCLEAR MEDICINE AND MOLECULAR IMAGING, 2024, 51 (02) : 358 - 368