Target Speaker Extraction with Attention Enhancement and Gated Fusion Mechanism

被引:1
|
作者
Wang Sijie [1 ,2 ]
Hamdulla, Askar [1 ,2 ]
Ablimit, Mijit [1 ,2 ]
机构
[1] Xinjiang Univ, Sch Informat Sci & Engn, Urumqi, Peoples R China
[2] Key Lab Signal Detect & Proc, Urumqi, Peoples R China
关键词
target speaker extraction; attention; gated fusion; multi-task learning; NETWORK;
D O I
10.1109/APSIPAASC58517.2023.10317106
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The objective of a target speaker extraction system is to extract the speech of the target speaker from a mixture of multiple speakers and noises using a certain amount of additional information of the target speaker. In this paper, we investigate the improvements of the baseline system by incorporating the light-weight CBAM module in the target extractor, and the gated fusion module (GFM) in the fusion layer. The CBAM introduces attention enhancement to baseline model with no significant increase in the number of parameters and complexity, and the previous concatenation-based fusion method used for speaker embedding and input mixture (or intermediate output) is replaced by GFM, enabling the model to better leverage the supplementary information provided by speaker embedding. Experimental results on datasets built from WSJ0-2mix and WHAM! demonstrate that both the CBAM module and the light-weight GFM module individually improve the model performance, and the GFM module shows better improvement on WHAM!. However, the combination of these two modules only exhibits mutually beneficial effects on the clean dataset WSJ0-2mix, while the performance of the combined module on the noisy dataset WHAM! is inferior to that of using the GFM module alone.
引用
收藏
页码:1995 / 2001
页数:7
相关论文
共 50 条
  • [21] X-TF-GridNet: A time-frequency domain target speaker extraction network with adaptive speaker embedding fusion
    Hao, Fengyuan
    Li, Xiaodong
    Zheng, Chengshi
    INFORMATION FUSION, 2024, 112
  • [22] DEGANet: Road Extraction Using Dual-Branch Encoder With Gated Attention Mechanism
    Li, Huang
    Chen, Si-Bao
    Huang, Li-Li
    Ding, Chris H. Q.
    Tang, Jin
    Luo, Bin
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21
  • [23] COMPACT NETWORK FOR SPEAKERBEAM TARGET SPEAKER EXTRACTION
    Delcroix, Marc
    Zmolikova, Katerina
    Ochiai, Tsubasa
    Kinoshita, Keisuke
    Araki, Shoko
    Nakatani, Tomohiro
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6965 - 6969
  • [24] Target Speaker Extraction by Fusing Voiceprint Features
    Cheng, Shidan
    Shen, Ying
    Wang, Dongqing
    APPLIED SCIENCES-BASEL, 2022, 12 (16):
  • [25] Neural Speaker Extraction with Speaker-Speech Cross-Attention Network
    Wang, Wupeng
    Xu, Chenglin
    Ge, Meng
    Li, Haizhou
    INTERSPEECH 2021, 2021, : 3535 - 3539
  • [26] A Novel Target Feature Fusion Method with Attention Mechanism for SAR-ATR
    Zeng, Zhiqiang
    Zhang, Hongbo
    Sun, Jinping
    2022 IEEE 17TH CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS (ICIEA), 2022, : 522 - 527
  • [27] Adaptive feature fusion with attention mechanism for multi-scale target detection
    Ju, Moran
    Luo, Jiangning
    Wang, Zhongbo
    Luo, Haibo
    NEURAL COMPUTING & APPLICATIONS, 2021, 33 (07): : 2769 - 2781
  • [28] Adaptive feature fusion with attention mechanism for multi-scale target detection
    Moran Ju
    Jiangning Luo
    Zhongbo Wang
    Haibo Luo
    Neural Computing and Applications, 2021, 33 : 2769 - 2781
  • [29] Multilayer Feature Fusion Network With Spatial Attention and Gated Mechanism for Remote Sensing Scene Classification
    Meng, Qingyan
    Zhao, Maofan
    Zhang, Linlin
    Shi, Wenxu
    Su, Chen
    Bruzzone, Lorenzo
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
  • [30] Spectrum Enhancement Based Modulation Recognition with Dual-Cue Attention Fusion and Extraction
    Gao, Jiaqi
    Li, Jie
    Ning, Siqin
    Wu, Qihui
    2024 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS WORKSHOPS, ICC WORKSHOPS 2024, 2024, : 94 - 98