MambaGAN: Mamba based Metric GAN for Monaural Speech Enhancement

被引：0

作者：

Luo, Tianhao ^{[1
,2
,3
]}

Zhou, Feng ^{[1
,2
,3
,4
]}

Bai, Zhongxin ^{[1
,2
,3
]}

机构：

[1] Harbin Engn Univ, Natl Key Lab Underwater Acoust Technol, Harbin 15001, Peoples R China

[2] Harbin Engn Univ, Minist Ind & Informat Technol, Key Lab Marine Informat Acquisit & Secur, Harbin 150001, Peoples R China

[3] Harbin Engn Univ, Coll Underwater Acoust Engn, Harbin 150001, Peoples R China

[4] Harbin Engn Univ, Sanya Nanhai Innovat & Dev Base, Sanya 572024, Peoples R China

来源：

2024 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING, IALP 2024 | 2024年

关键词：

Mamba; Omin-dimension Dynamic Convolution; perceptual contrast stretching; speech enhancement;

D O I：

10.1109/IALP63756.2024.10661187

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Encoder-decoder structures are widely used in deep neural network-based speech enhancement (SE), often utilizing convolutional and transformer modules as basic components. The high computational demands of transformer often limit their real-time performance. To address this issue, we propose a novel speech enhancement network called MambaGAN by combining MambaFormer and ODConv within a GAN framework. MambaFormer is a structure based on Mamba to replace transformer in SE networks. Additionally, an Omni-dimensional Dynamic Convolution (ODConv) is introduced to replace convolutional modules for capturing richer speech features more flexibly. Experimental results on the VoiceBank+DEMAND dataset show that MambaGAN achieved an impressive PESQ score of 3.56. When combined with perceptual contrast stretching, it achieved a new state-of-the-art PESQ score of 3.72, while exhibiting lower computational complexity than existing conformer-based models.

引用

页码：411 / 416

页数：6

共 50 条

[1] CMGAN: Conformer-Based Metric-GAN for Monaural Speech Enhancement
Abdulatif, Sherif
Cao, Ruizhe
Yang, Bin
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 2477 - 2493
[2] GAN-in-GAN for Monaural Speech Enhancement
Duan, Yicun
Ren, Jianfeng
Yu, Heng
Jiang, Xudong
IEEE SIGNAL PROCESSING LETTERS, 2023, 30 : 853 - 857
[3] MAMGAN: Multiscale attention metric GAN for monaural speech enhancement in the time domain
Guo, Huimin
Jian, Haifang
Wang, Yequan
Wang, Hongchang
Zhao, Xiaofan
Zhu, Wenqi
Cheng, Qinghua
APPLIED ACOUSTICS, 2023, 209
[4] CMGAN: Conformer-based Metric GAN for Speech Enhancement
Cao, Ruizhe
Abdulatif, Sherif
Yang, Bin
INTERSPEECH 2022, 2022, : 936 - 940
[5] MRGAN: LightWeight Monaural Speech Enhancement Using GAN Network
Meng, Chunyu
Wei, Guangcun
Long, Yanhong
Kong, Chuike
Ma, Penghao
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT IV, 2025, 15034 : 369 - 377
[6] Monaural speech enhancement based on periodicity analysis
Chen, Z.
Hohmann, V
BIOMEDICAL ENGINEERING-BIOMEDIZINISCHE TECHNIK, 2014, 59 : S736 - S736
[7] Double Adversarial Network based Monaural Speech Enhancement for Robust Speech Recognition
Du, Zhihao
Han, Jiqing
Zhang, Xueliang
INTERSPEECH 2020, 2020, : 309 - 313
[8] Harmonic Attention for Monaural Speech Enhancement
Wang, Tianrui
Zhu, Weibin
Gao, Yingying
Zhang, Shilei
Feng, Junlan
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 2424 - 2436
[9] Monaural speech enhancement with dilated convolutions
Pirhosseinloo, Shadi
Brumberg, Jonathan S.
INTERSPEECH 2019, 2019, : 3143 - 3147
[10] GAN and Diffusion Based Speech Enhancement
Ayata, Deger
Horasan, Ugur
32ND IEEE SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU 2024, 2024,

← 1 2 3 4 5 →