MambaGAN: Mamba based Metric GAN for Monaural Speech Enhancement

被引:0
|
作者
Luo, Tianhao [1 ,2 ,3 ]
Zhou, Feng [1 ,2 ,3 ,4 ]
Bai, Zhongxin [1 ,2 ,3 ]
机构
[1] Harbin Engn Univ, Natl Key Lab Underwater Acoust Technol, Harbin 15001, Peoples R China
[2] Harbin Engn Univ, Minist Ind & Informat Technol, Key Lab Marine Informat Acquisit & Secur, Harbin 150001, Peoples R China
[3] Harbin Engn Univ, Coll Underwater Acoust Engn, Harbin 150001, Peoples R China
[4] Harbin Engn Univ, Sanya Nanhai Innovat & Dev Base, Sanya 572024, Peoples R China
来源
2024 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING, IALP 2024 | 2024年
关键词
Mamba; Omin-dimension Dynamic Convolution; perceptual contrast stretching; speech enhancement;
D O I
10.1109/IALP63756.2024.10661187
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Encoder-decoder structures are widely used in deep neural network-based speech enhancement (SE), often utilizing convolutional and transformer modules as basic components. The high computational demands of transformer often limit their real-time performance. To address this issue, we propose a novel speech enhancement network called MambaGAN by combining MambaFormer and ODConv within a GAN framework. MambaFormer is a structure based on Mamba to replace transformer in SE networks. Additionally, an Omni-dimensional Dynamic Convolution (ODConv) is introduced to replace convolutional modules for capturing richer speech features more flexibly. Experimental results on the VoiceBank+DEMAND dataset show that MambaGAN achieved an impressive PESQ score of 3.56. When combined with perceptual contrast stretching, it achieved a new state-of-the-art PESQ score of 3.72, while exhibiting lower computational complexity than existing conformer-based models.
引用
收藏
页码:411 / 416
页数:6
相关论文
共 50 条
  • [1] CMGAN: Conformer-Based Metric-GAN for Monaural Speech Enhancement
    Abdulatif, Sherif
    Cao, Ruizhe
    Yang, Bin
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 2477 - 2493
  • [2] GAN-in-GAN for Monaural Speech Enhancement
    Duan, Yicun
    Ren, Jianfeng
    Yu, Heng
    Jiang, Xudong
    IEEE SIGNAL PROCESSING LETTERS, 2023, 30 : 853 - 857
  • [3] MAMGAN: Multiscale attention metric GAN for monaural speech enhancement in the time domain
    Guo, Huimin
    Jian, Haifang
    Wang, Yequan
    Wang, Hongchang
    Zhao, Xiaofan
    Zhu, Wenqi
    Cheng, Qinghua
    APPLIED ACOUSTICS, 2023, 209
  • [4] CMGAN: Conformer-based Metric GAN for Speech Enhancement
    Cao, Ruizhe
    Abdulatif, Sherif
    Yang, Bin
    INTERSPEECH 2022, 2022, : 936 - 940
  • [5] MRGAN: LightWeight Monaural Speech Enhancement Using GAN Network
    Meng, Chunyu
    Wei, Guangcun
    Long, Yanhong
    Kong, Chuike
    Ma, Penghao
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT IV, 2025, 15034 : 369 - 377
  • [6] Monaural speech enhancement based on periodicity analysis
    Chen, Z.
    Hohmann, V
    BIOMEDICAL ENGINEERING-BIOMEDIZINISCHE TECHNIK, 2014, 59 : S736 - S736
  • [7] Double Adversarial Network based Monaural Speech Enhancement for Robust Speech Recognition
    Du, Zhihao
    Han, Jiqing
    Zhang, Xueliang
    INTERSPEECH 2020, 2020, : 309 - 313
  • [8] Harmonic Attention for Monaural Speech Enhancement
    Wang, Tianrui
    Zhu, Weibin
    Gao, Yingying
    Zhang, Shilei
    Feng, Junlan
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 2424 - 2436
  • [9] Monaural speech enhancement with dilated convolutions
    Pirhosseinloo, Shadi
    Brumberg, Jonathan S.
    INTERSPEECH 2019, 2019, : 3143 - 3147
  • [10] GAN and Diffusion Based Speech Enhancement
    Ayata, Deger
    Horasan, Ugur
    32ND IEEE SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU 2024, 2024,