Speech Enhancement using Convolution Neural Network-based Spectrogram Denoising

被引：0

作者：

Hu Xuhong ^{[1
]}

Yan Lin-Huang ^{[2
]}

Lu Xun ^{[3
]}

Guan Yuan-Sheng ^{[2
]}

Hu Wenlin ^{[1
]}

Wang Jie ^{[2
,4
]}

机构：

[1] China Railway Design Corp, Natl Engn Lab Digital Construct & Evaluat Urban R, Tianjin, Peoples R China

[2] Guangzhou Univ, Sch Elect & Commun Engn, Guangzhou, Guangdong, Peoples R China

[3] Guangdong Power Grid Co, Power Grid Planning Ctr, Guangzhou, Guangdong, Peoples R China

[4] Ctr Rd Traff Noise Control, Natl Environm Protect Engn & Technol, Beijing, Peoples R China

来源：

PROCEEDINGS OF 2021 7TH INTERNATIONAL CONFERENCE ON CONDITION MONITORING OF MACHINERY IN NON-STATIONARY OPERATIONS (CMMNO) | 2021年

关键词：

Speech enhancement; deep learning; convolution neural network; spectrogram denoising; NOISE; EFFICIENT;

D O I：

10.1109/CMMNO53328.2021.9467599

中图分类号：

TH [机械、仪表工业];

学科分类号：

0802 ;

摘要：

Regarding spectrogram as an image, this paper adopts a convolution neural network (CNN)-based image enhancement algorithm for spectrogram denoising. By doing so, speech denoising can be achieved when the spectrogram is enhanced by the proposed CNN-based image enhancement algorithm. The spectrogram clipping strategy was presented to obtain a large amount of training data, which gave rise to a smaller storage cost and avoided the limited depth development and problem of excessive complexity commonly presented in traditional speech features when training a recurrent neural network. Meanwhile, a deeper network was constructed to improve the capacity and flexibility to use the features of the spectrogram better, and it can also capture enough spatial information to make the noise reduction performance effectively. In addition, the proposed model utilized residual learning strategy in CNN training, with the combination of batch normalization, which greatly improved the performance of the model. The experimental results demonstrates that the proposed spectrogram denoising model has better learning ability and denoising performance, whether it is a known noise situation or a noise mismatch situation, so that the proposed system shows robust speech enhancement effect.

引用

页码：310 / 318

页数：9

共 50 条

[21] MFF-SAug: Multi feature fusion with spectrogram augmentation of speech emotion recognition using convolution neural network
Jothimani, S.
Premalatha, K.
CHAOS SOLITONS & FRACTALS, 2022, 162
[22] Convolutional neural network-based fracture detection in spectrogram of acoustic emission
Monika, R.
Deivalakshmi, S.
SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (05) : 4059 - 4074
[23] Image denoising method based on a deep convolution neural network
Zhang, Fu
Cai, Nian
Wu, Jixiu
Cen, Guandong
Wang, Han
Chen, Xindu
IET IMAGE PROCESSING, 2018, 12 (04) : 485 - 493
[24] Simultaneous denoising and spatial resolution enhancement using convolutional neural network-based linear model in diagnostic CT images
Yim, Dobin
Kim, Burnyoung
Lee, Seungwan
MEDICAL IMAGING 2020: PHYSICS OF MEDICAL IMAGING, 2020, 11312
[25] Neural network-based adaptive noise cancellation for enhancement of speech auditory brainstem responses
Shiva Gholami-Boroujeny
Anwar Fallatah
Brian P. Heffernan
Hilmi R. Dajani
Signal, Image and Video Processing, 2016, 10 : 389 - 395
[26] Low-dimensional recurrent neural network-based Kalman filter for speech enhancement
Xia, Youshen
Wang, Jun
NEURAL NETWORKS, 2015, 67 : 131 - 139
[27] Neural network-based adaptive noise cancellation for enhancement of speech auditory brainstem responses
Gholami-Boroujeny, Shiva
Fallatah, Anwar
Heffernan, Brian P.
Dajani, Hilmi R.
SIGNAL IMAGE AND VIDEO PROCESSING, 2016, 10 (02) : 389 - 395
[28] Image denoising with a convolution neural network using Gaussian filtered residuals
Mohan L.
Veeramani V.
IEIE Transactions on Smart Processing and Computing, 2021, 10 (02): : 96 - 100
[29] Denoising in the Dark: Privacy-Preserving Deep Neural Network-Based Image Denoising
Zheng, Yifeng
Duan, Huayi
Tang, Xiaoting
Wang, Cong
Zhou, Jiantao
IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2021, 18 (03) : 1261 - 1275
[30] Style Transplantation in Neural Network-based Speech Synthesis
Suzic, Sinisa B.
Delic, Tijana, V
Pekar, Darko J.
Delic, Vlado D.
Secujski, Milan S.
ACTA POLYTECHNICA HUNGARICA, 2019, 16 (06) : 171 - 189

← 1 2 3 4 5 →