A Study on Speech Enhancement Based on Diffusion Probabilistic Model

被引：0

作者：

Lu, Yen-Ju ^{[1
]}

Tsao, Yu ^{[1
]}

Watanabe, Shinji ^{[2
]}

机构：

[1] Acad Sinica, Res Ctr Informat Technol Innovat, Taipei, Taiwan

[2] Carnegie Mellon Univ, Language Technol Inst, Pittsburgh, PA 15213 USA

来源：

2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC) | 2021年

关键词：

TO-VECTOR REGRESSION; NOISE; INTELLIGIBILITY;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Diffusion probabilistic models have demonstrated an outstanding capability to model natural images and raw audio waveforms through a paired diffusion and reverse processes. The unique property of the reverse process (namely, eliminating non-target signals from the Gaussian noise and noisy signals) could be utilized to restore clean signals. Based on this property, we propose a diffusion probabilistic model-based speech enhancement (DiffuSE) model that aims to recover clean speech signals from noisy signals. The fundamental architecture of the proposed DiffuSE model is similar to that of DiffWave-a high-quality audio waveform generation model that has a relatively low computational cost and footprint. To attain better enhancement performance, we designed an advanced reverse process, termed the supportive reverse process, which adds noisy speech in each time-step to the predicted speech. The experimental results show that DiffuSE yields performance that is comparable to related audio generative models on the standardized Voice Bank corpus SE task. Moreover, relative to the generally suggested full sampling schedule, the proposed supportive reverse process especially improved the fast sampling, taking few steps to yield better enhancement results over the conventional full step inference process.

引用

页码：659 / 666

页数：8

共 50 条

[41] Constrained Probabilistic Subspace Maps Applied to Speech Enhancement
Kalgaonkar, Kaustubh
Clements, Mark A.
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1919 - 1922
[42] Residual Fusion Probabilistic Knowledge Distillation for Speech Enhancement
Cheng, Jiaming
Liang, Ruiyu
Zhou, Lin
Zhao, Li
Huang, Chengwei
Schuller, Bjorn W.
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 2680 - 2691
[43] Spectral difference for statistical model-based speech enhancement in speech recognition
Soojeong Lee
Joon-Hyuk Chang
Multimedia Tools and Applications, 2017, 76 : 24917 - 24929
[44] Speech enhancement based on the decomposition of speech into deterministic and stochastic components and psychoacoustic model
Jo, Seokhwan
Yoo, Chang D.
2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 897 - +
[45] Spectral difference for statistical model-based speech enhancement in speech recognition
Lee, Soojeong
Chang, Joon-Hyuk
MULTIMEDIA TOOLS AND APPLICATIONS, 2017, 76 (23) : 24917 - 24929
[46] A model distance maximizing framework for speech recognizer-based speech enhancement
BabaAli, Bagher
Sameti, Hossein
Falk, Tiago H.
AEU-INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATIONS, 2011, 65 (02) : 99 - 106
[47] A Speech Enhancement Algorithm Based on a Chi MRF Model of the Speech STFT Amplitudes
Andrianakis, Yiannis
White, Paul R.
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (08): : 1508 - 1517
[48] Breast Tumor Image Synthesis based on Diffusion Probabilistic Model
Oh, Seok-Hwan
Jung, Guil
Kim, MyeongGee
Kim, Young-Min
Lee, Hyeon-Jik
Kim, Sang-Yun
Kwon, Hyuk-Sool
Bae, Hyeon-Min
2024 IEEE ULTRASONICS, FERROELECTRICS, AND FREQUENCY CONTROL JOINT SYMPOSIUM, UFFC-JS 2024, 2024,
[49] PET image denoising based on denoising diffusion probabilistic model
Kuang Gong
Keith Johnson
Georges El Fakhri
Quanzheng Li
Tinsu Pan
European Journal of Nuclear Medicine and Molecular Imaging, 2024, 51 : 358 - 368
[50] PET image denoising based on denoising diffusion probabilistic model
Gong, Kuang
Johnson, Keith
El Fakhri, Georges
Li, Quanzheng
Pan, Tinsu
EUROPEAN JOURNAL OF NUCLEAR MEDICINE AND MOLECULAR IMAGING, 2024, 51 (02) : 358 - 368

← 1 2 3 4 5 →