3D Speech Enhancement Algorithm for Two-Stage U-Net Beamforming Network

被引:1
|
作者
Lin, Wenmo [1 ]
Chen, Feilong [1 ]
Sun, Chengli [1 ]
Zhu, Zhenjun [1 ]
机构
[1] School of Information and Engineering, Nanchang Hangkong University, Nanchang,330063, China
关键词
3d speech signal - And filters - Deep learning - Downstream applications - Dual microphones - Enhancement technologies - Multiple inputs single outputs - Signal noise - Speech enhancement algorithm - Speech signals;
D O I
10.3778/j.issn.1002-8331.2207-0352
中图分类号
学科分类号
摘要
The noise in the 3D reverberation environment is detrimental to many downstream applications. The development of 3D speech enhancement technology suitable for realistic scenes has important theoretical significance and practical value in real life. This paper proposes a two-stage beamforming network for 3D speech enhancement in this scenario. The network consists of two consecutive multiple-input single-output U-Net beamforming networks. The first-level network mainly performs rough beamforming estimation on the 3D speech signal from the dual microphones, and filters out part of the signal noise. In order to further improve the estimation, the second-level network takes the characteristics of the rough estimated signal together with the characteristics of the omnidirectional channel information in the original signal as input, and performs the beamforming fine estimation to obtain a more accurate estimated signal and achieve the purpose of two-level enhancement. The dataset and experiments are based on the 3D speech enhancement task of the L3DAS22 challenges. The short-time objective intelligibility(STOI)obtained by this method on the blind test set is 0.925, and the word error rate(WER)reaches 13.6%, which is significantly better than the L3DAS21 3D speech enhancement challenge, the champion model in the competition(0.878 and 21.2%). © 2023 Editorial Office Of Water Saving Irrigation. All rights reserved.
引用
收藏
页码:128 / 135
相关论文
共 50 条
  • [1] A Two-Stage Beamforming and Diffusion-Based Refiner System for 3D Speech Enhancement
    Chen, Feilong
    Lin, Wenmo
    Sun, Chengli
    Guo, Qiaosheng
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2024, 43 (7) : 4369 - 4389
  • [2] TTGA U-Net: Two-stage two-stream graph attention U-Net for hepatic vessel connectivity enhancement
    Zhao, Ziqi
    Li, Wentao
    Ding, Xiaoyi
    Sun, Jianqi
    Xu, Lisa X.
    COMPUTERIZED MEDICAL IMAGING AND GRAPHICS, 2025, 122
  • [3] ThoraxNet: a 3D U-Net based two-stage framework for OAR segmentation on thoracic CT images
    Francis, Seenia
    Jayaraj, P. B.
    Pournami, P. N.
    Thomas, Manu
    Jose, Ajay Thoomkuzhy
    Binu, Allen John
    Puzhakkal, Niyas
    PHYSICAL AND ENGINEERING SCIENCES IN MEDICINE, 2022, 45 (01) : 189 - 203
  • [4] ThoraxNet: a 3D U-Net based two-stage framework for OAR segmentation on thoracic CT images
    Seenia Francis
    P. B. Jayaraj
    P. N. Pournami
    Manu Thomas
    Ajay Thoomkuzhy Jose
    Allen John Binu
    Niyas Puzhakkal
    Physical and Engineering Sciences in Medicine, 2022, 45 : 189 - 203
  • [5] Two-Stage Network Architecture for Sonar Image Denoising Based on U-Net
    Liu, Wenjiang
    Yang, Xinghai
    Li, Jingwen
    Yang, Hongxiu
    2024 13TH INTERNATIONAL CONFERENCE ON COMMUNICATIONS, CIRCUITS AND SYSTEMS, ICCCAS 2024, 2024, : 548 - 553
  • [6] A Nested U-Net with Efficient Channel Attention and D3Net for Speech Enhancement
    Sivaramakrishna Yechuri
    Sunnydayal Vanambathina
    Circuits, Systems, and Signal Processing, 2023, 42 : 4051 - 4071
  • [7] Two-Stage Inpainting Algorithm Based on U-net Edge Generation and Hypergraphs Convolution
    Li H.-Y.
    Xiong L.-C.
    Guo L.
    Li H.-J.
    Dongbei Daxue Xuebao/Journal of Northeastern University, 2023, 44 (03): : 331 - 339
  • [8] A Nested U-Net with Efficient Channel Attention and D3Net for Speech Enhancement
    Yechuri, Sivaramakrishna
    Vanambathina, Sunnydayal
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2023, 42 (07) : 4051 - 4071
  • [9] A two-stage algorithm for enhancement of reverberant speech
    Wu, MY
    Wang, D
    2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 1085 - 1088
  • [10] A Two-Stage U-Net to Estimate the Cultivated Area of Plantations
    dos Santos Oliveira, Walysson Carlos
    Braz Junior, Geraldo
    Gomes Junior, Daniel Lima
    de Paiva, Anselmo Cardoso
    Sousa de Almeida, Joao Dallyson
    IMAGE ANALYSIS AND PROCESSING, ICIAP 2022, PT I, 2022, 13231 : 346 - 357