A Two-Stage Beamforming and Diffusion-Based Refiner System for 3D Speech Enhancement

被引:0
|
作者
Chen, Feilong [1 ]
Lin, Wenmo [1 ]
Sun, Chengli [1 ]
Guo, Qiaosheng [2 ]
机构
[1] Nanchang Hangkong Univ, Sch Informat Engn, Nanchang 330063, Peoples R China
[2] Chaoyang Jushengtai Xinfeng Technol Co Ltd, Ganzhou 341001, Peoples R China
关键词
Speech enhancement; 3D speech signal; Diffusion model; Beamforming; Multi-channel;
D O I
10.1007/s00034-024-02652-y
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Speech enhancement in 3D reverberant environments is a challenging and significant problem for many downstream applications, such as speech recognition, speaker identification, and audio analysis. Existing deep neural network models have shown efficacy for 3D speech enhancement tasks, but they often introduce distortions or unnatural artifacts in the enhanced speech. In this work, we propose a novel two-stage refiner system that integrates a neural beamforming network and a diffusion model for robust 3D speech enhancement. The neural beamforming network performs spatial filtering to suppress the noise and reverberation; while, the diffusion model leverages its generative capability to restore the missing or distorted speech components from the beamformed output. To the best of our knowledge, this is the first work that applies the diffusion model as a backend refiner to 3D speech enhancement. We investigate the effect of training the diffusion model with either enhanced speech or clean speech, and find that clean speech can better capture the prior knowledge of speech components and improve the speech recovery. We evaluate our proposed system on different datasets and beamformer architectures, and show that it achieves consistent improvements in metrics like WER and NISQA, indicating that the diffusion model has strong generalization ability and can serve as a backend refinement module for 3D speech enhancement, regardless of the front-end beamforming network. Our work demonstrates the effectiveness of integrating discriminative and generative models for robust 3D speech enhancement, and also opens up a new direction for applying generative diffusion models to 3D speech processing tasks, which can be used as a backend to various beamforming enhancement methods.
引用
收藏
页码:4369 / 4389
页数:21
相关论文
共 50 条
  • [31] A generic diffusion-based approach for 3D human pose prediction in the wild
    Saadatnejad, Saeed
    Rasekh, Ali
    Mofayezi, Mohammadreza
    Medghalchi, Yasamin
    Rajahzadeh, Sara
    Mordan, Taylor
    Alahi, Alexandre
    2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2023), 2023, : 8246 - 8253
  • [32] Speech Enhancement Using a Two-Stage Network for an Efficient Boosting Strategy
    Kim, Juntae
    Hahn, Minsoo
    IEEE SIGNAL PROCESSING LETTERS, 2019, 26 (05) : 770 - 774
  • [33] A two-stage algorithm for one-microphone reverberant speech enhancement
    Wu, MY
    Wang, DL
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (03): : 774 - 784
  • [34] Two-Stage Deep Learning for Noisy-Reverberant Speech Enhancement
    Zhao, Yan
    Wang, Zhong-Qiu
    Wang, DeLiang
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (01) : 53 - 62
  • [35] Two-Stage Temporal Processing for Single-Channel Speech Enhancement
    Samui, Sunzan
    Chakrabarti, Indrajit
    Ghosh, Soumya Kanti
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3723 - 3727
  • [36] A Two-Stage Adaptive Clustering Approach for 3D Point Clouds
    Zhang, Caihong
    Wang, Shaoping
    Yu, Biao
    Li, Bichun
    Zhu, Hui
    2019 4TH ASIA-PACIFIC CONFERENCE ON INTELLIGENT ROBOT SYSTEMS (ACIRS 2019), 2019, : 11 - 16
  • [37] 3D mesh segmentation using a two-stage merging strategy
    Pan, X
    Ye, XZ
    Zhang, SY
    FOURTH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY, PROCEEDINGS, 2004, : 730 - 733
  • [38] TSF: Two-Stage Sequential Fusion for 3D Object Detection
    Qi, Heng
    Shi, Peicheng
    Liu, Zhiqiang
    Yang, Aixi
    IEEE SENSORS JOURNAL, 2022, 22 (12) : 12163 - 12172
  • [39] TSFF: a two-stage fusion framework for 3D object detection
    Jiang, Guoqing
    Li, Saiya
    Huang, Ziyu
    Cai, Guorong
    Su, Jinhe
    PeerJ Computer Science, 2024, 10
  • [40] Reconstruction of 3D genome architecture via a two-stage algorithm
    Mark R. Segal
    Henrik L. Bengtsson
    BMC Bioinformatics, 16