Target Speaker Extraction with Attention Enhancement and Gated Fusion Mechanism

被引:1
|
作者
Wang Sijie [1 ,2 ]
Hamdulla, Askar [1 ,2 ]
Ablimit, Mijit [1 ,2 ]
机构
[1] Xinjiang Univ, Sch Informat Sci & Engn, Urumqi, Peoples R China
[2] Key Lab Signal Detect & Proc, Urumqi, Peoples R China
关键词
target speaker extraction; attention; gated fusion; multi-task learning; NETWORK;
D O I
10.1109/APSIPAASC58517.2023.10317106
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The objective of a target speaker extraction system is to extract the speech of the target speaker from a mixture of multiple speakers and noises using a certain amount of additional information of the target speaker. In this paper, we investigate the improvements of the baseline system by incorporating the light-weight CBAM module in the target extractor, and the gated fusion module (GFM) in the fusion layer. The CBAM introduces attention enhancement to baseline model with no significant increase in the number of parameters and complexity, and the previous concatenation-based fusion method used for speaker embedding and input mixture (or intermediate output) is replaced by GFM, enabling the model to better leverage the supplementary information provided by speaker embedding. Experimental results on datasets built from WSJ0-2mix and WHAM! demonstrate that both the CBAM module and the light-weight GFM module individually improve the model performance, and the GFM module shows better improvement on WHAM!. However, the combination of these two modules only exhibits mutually beneficial effects on the clean dataset WSJ0-2mix, while the performance of the combined module on the noisy dataset WHAM! is inferior to that of using the GFM module alone.
引用
收藏
页码:1995 / 2001
页数:7
相关论文
共 50 条
  • [31] Aerial target threat assessment based on gated recurrent unit and self-attention mechanism
    CHEN Chen
    QUAN Wei
    SHAO Zhuang
    JournalofSystemsEngineeringandElectronics, 2024, 35 (02) : 361 - 373
  • [32] Aerial Target Threat Assessment Based on Gated Recurrent Unit and Self-Attention Mechanism
    Chen, Chen
    Quan, Wei
    Shao, Zhuang
    Journal of Systems Engineering and Electronics, 2024, 35 (02) : 361 - 373
  • [33] Aerial target threat assessment based on gated recurrent unit and self-attention mechanism
    Chen, Chen
    Quan, Wei
    Shao, Zhuang
    JOURNAL OF SYSTEMS ENGINEERING AND ELECTRONICS, 2024, 35 (02) : 361 - 373
  • [34] A target detection model based on parallel interactive feature extraction and attention fusion structure
    Gao, Ruxin
    Li, Xinyu
    Wang, Tengfei
    Jin, Haiquan
    Ma, Yongfei
    Liu, Qunpo
    Su, Bo
    INFRARED PHYSICS & TECHNOLOGY, 2025, 145
  • [35] Joint multimodal entity-relation extraction based on temporal enhancement and similarity-gated attention
    Wang, Guoxiang
    Liu, Jin
    Xie, Jialong
    Zhu, Zhenwei
    Zhou, Fengyu
    KNOWLEDGE-BASED SYSTEMS, 2024, 304
  • [36] Entity and relation collaborative extraction approach based on multi-head attention and gated mechanism
    Zhao, Wei
    Zhao, Shan
    Chen, Shuhui
    Weng, Tien-Hsiung
    Kang, WenJie
    CONNECTION SCIENCE, 2022, 34 (01) : 670 - 686
  • [37] SpeakerBeam: Speaker Aware Neural Network for Target Speaker Extraction in Speech Mixtures
    Zmolikova, Katerina
    Delcroix, Marc
    Kinoshita, Keisuke
    Ochiai, Tsubasa
    Nakatani, Tomohiro
    Burget, Lukas
    Cernocky, Jan
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2019, 13 (04) : 800 - 814
  • [38] SEF-Net: Speaker Embedding Free Target Speaker Extraction Network
    Zeng, Bang
    Suo, Hongbin
    Wan, Yulong
    Li, Ming
    INTERSPEECH 2023, 2023, : 3452 - 3456
  • [39] ATTENTION MECHANISM IN SPEAKER RECOGNITION: WHAT DOES IT LEARN IN DEEP SPEAKER EMBEDDING?
    Wang, Qiongqiong
    Okabe, Koji
    Lee, Kong Aik
    Yamamoto, Hitoshi
    Koshinaka, Takafumi
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 1052 - 1059
  • [40] Gated attention fusion network for multimodal sentiment classification
    Du, Yongping
    Liu, Yang
    Peng, Zhi
    Jin, Xingnan
    KNOWLEDGE-BASED SYSTEMS, 2022, 240