Single-Channel Speech Enhancement Based on Improved Frame-Iterative Spectral Subtraction in the Modulation Domain

被引：0

作者：

Li, Chao ^{[1
]}

Jiang, Ting ^{[1
]}

Wu, Sheng ^{[1
]}

机构：

[1] Beijing Univ Posts & Telecommun, Sch Informat & Commun Engn, Beijing 100876, Peoples R China

来源：

CHINA COMMUNICATIONS | 2021年 / 18卷 / 09期

基金：

中国国家自然科学基金; 国家自然科学基金重大项目;

关键词：

short-time modulation domain; single-channel speech enhancement; modulation improved frame iterative spectral subtraction; low SNRs; MEAN-SQUARE ERROR; NOISE; MAGNITUDE;

D O I：

暂无

中图分类号：

TN [电子技术、通信技术];

学科分类号：

0809 ;

摘要：

Aiming at the problem of music noise introduced by classical spectral subtraction, a short-time modulation domain (STM) spectral subtraction method has been successfully applied for single-channel speech enhancement. However, due to the inaccurate voice activity detection (VAD), the residual music noise and enhanced performance still need to be further improved, especially in the low signal to noise ratio (SNR) scenarios. To address this issue, an improved frame iterative spectral subtraction in the STM domain (IMModSSub) is proposed. More specifically, with the inter-frame correlation, the noise subtraction is directly applied to handle the noisy signal for each frame in the STM domain. Then, the noisy signal is classified into speech or silence frames based on a predefined threshold of segmented SNR. With these classification results, a corresponding mask function is developed for noisy speech after noise subtraction. Finally, exploiting the increased sparsity of speech signal in the modulation domain, the orthogonal matching pursuit (OMP) technique is employed to the speech frames for improving the speech quality and intelligibility. The effectiveness of the proposed method is evaluated with three types of noise, including white noise, pink noise, and hfchannel noise. The obtained results show that the proposed method outperforms some established baselines at lower SNRs (5 to +5 dB).

引用

页码：100 / 115

页数：16

共 50 条

[1] Single-Channel Speech Enhancement Based on Improved Frame-Iterative Spectral Subtraction in the Modulation Domain
Chao Li
Ting Jiang
Sheng Wu
China Communications, 2021, 18 (09) : 100 - 115
[2] Single-channel speech enhancement using spectral subtraction in the short-time modulation domain
Paliwal, Kuldip
Wojcicki, Kamil
Schwerin, Belinda
SPEECH COMMUNICATION, 2010, 52 (05) : 450 - 475
[3] Modulation Domain Spectral Subtraction for Speech Enhancement
Paliwal, Kuldip
Schwerin, Belinda
Wojcicki, Kamil
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1343 - 1346
[4] Single-channel speech enhancement using Kalman filtering in the modulation domain
So, Stephen
Wojcicki, Kamil K.
Paliwal, Kuldip K.
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 993 - 996
[5] New Results in Modulation-Domain Single-Channel Speech Enhancement
Mowlaee, Pejman
Blass, Martin
Kleijn, W. Bastiaan
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (11) : 2125 - 2137
[6] Modulation-domain Kalman filtering for single-channel speech enhancement
So, Stephen
Paliwal, Kuldip K.
SPEECH COMMUNICATION, 2011, 53 (06) : 818 - 829
[7] Single-channel speech enhancement based on frequency domain ALE
Nakanishi, Isao
Nagata, Yuudai
Itoh, Yoshio
Fukui, Yutaka
2006 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-11, PROCEEDINGS, 2006, : 2541 - 2544
[8] Complex tensor factorization in modulation frequency domain for single-channel speech enhancement
Masaya, Shogo
Unoki, Masashi
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1765 - 1769
[9] A SPECTRAL CONVERSION BASED SINGLE-CHANNEL SINGLE-MICROPHONE SPEECH ENHANCEMENT
Huy-Khoi Do
Quang Vinh Thai
FOURTH INTERNATIONAL CONFERENCE ON COMPUTER AND ELECTRICAL ENGINEERING (ICCEE 2011), 2011, : 583 - +
[10] A spectral conversion approach to single-channel speech enhancement
Mouchtaris, Athanasios
Van der Spiegel, Jan
Mueller, Paul
Tsakalides, Panagiotis
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (04): : 1180 - 1193

← 1 2 3 4 5 →