Target Speaker Extraction with Attention Enhancement and Gated Fusion Mechanism

被引:1
|
作者
Wang Sijie [1 ,2 ]
Hamdulla, Askar [1 ,2 ]
Ablimit, Mijit [1 ,2 ]
机构
[1] Xinjiang Univ, Sch Informat Sci & Engn, Urumqi, Peoples R China
[2] Key Lab Signal Detect & Proc, Urumqi, Peoples R China
关键词
target speaker extraction; attention; gated fusion; multi-task learning; NETWORK;
D O I
10.1109/APSIPAASC58517.2023.10317106
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The objective of a target speaker extraction system is to extract the speech of the target speaker from a mixture of multiple speakers and noises using a certain amount of additional information of the target speaker. In this paper, we investigate the improvements of the baseline system by incorporating the light-weight CBAM module in the target extractor, and the gated fusion module (GFM) in the fusion layer. The CBAM introduces attention enhancement to baseline model with no significant increase in the number of parameters and complexity, and the previous concatenation-based fusion method used for speaker embedding and input mixture (or intermediate output) is replaced by GFM, enabling the model to better leverage the supplementary information provided by speaker embedding. Experimental results on datasets built from WSJ0-2mix and WHAM! demonstrate that both the CBAM module and the light-weight GFM module individually improve the model performance, and the GFM module shows better improvement on WHAM!. However, the combination of these two modules only exhibits mutually beneficial effects on the clean dataset WSJ0-2mix, while the performance of the combined module on the noisy dataset WHAM! is inferior to that of using the GFM module alone.
引用
收藏
页码:1995 / 2001
页数:7
相关论文
共 50 条
  • [1] MULTIMODAL ATTENTION FUSION FOR TARGET SPEAKER EXTRACTION
    Sato, Hiroshi
    Ochiai, Tsubasa
    Kinoshita, Keisuke
    Delcroix, Marc
    Nakatani, Tomohiro
    Araki, Shoko
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 778 - 784
  • [2] Hierarchic Temporal Convolutional Network with Attention Fusion for Target Speaker Extraction
    Chen, Zihao
    Qiu, Wenbo
    Xu, Haitao
    Hu, Ying
    PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 827 - 832
  • [3] Contrastive Learning for Target Speaker Extraction With Attention-Based Fusion
    Li, Xiao
    Liu, Ruirui
    Huang, Huichou
    Wu, Qingyao
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 178 - 188
  • [4] Single-Channel Target Speaker Extraction System with Attention Enhancement
    Lai, Yen-Ting
    Lin, Yi-En
    Chang, Pao-Chi
    Wang, Jia-Ching
    2022 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS - TAIWAN, IEEE ICCE-TW 2022, 2022, : 433 - 434
  • [5] Gated Convolutional Fusion for Time-Domain Target Speaker Extraction Network
    Liu, Wenjing
    Xie, Chuan
    INTERSPEECH 2022, 2022, : 5368 - 5372
  • [6] Binaural Selective Attention Model for Target Speaker Extraction
    Meng, Hanyu
    Zhang, Qiquan
    Zhang, Xiangyu
    Sethu, Vidhyasaharan
    Ambikairajah, Eliathamby
    INTERSPEECH 2024, 2024, : 4323 - 4327
  • [7] SPEAKER-AWARE TARGET SPEAKER ENHANCEMENT BY JOINTLY LEARNING WITH SPEAKER EMBEDDING EXTRACTION
    Ji, Xuan
    Yu, Meng
    Zhang, Chunlei
    Su, Dan
    Yu, Tao
    Liu, Xiaoyu
    Yu, Dong
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7294 - 7298
  • [8] Gated Dynamic Attention Mechanism towards Aspect Extraction
    Cheng M.
    Hong Y.
    Tang J.
    Zhang J.
    Zou B.
    Yao J.
    Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2019, 32 (02): : 184 - 192
  • [9] Speaker extraction network with attention mechanism for speech dialogue system
    Hao, Yun
    Wu, Jiaju
    Huang, Xiangkang
    Zhang, Zijia
    Liu, Fei
    Wu, Qingyao
    SERVICE ORIENTED COMPUTING AND APPLICATIONS, 2022, 16 (02) : 111 - 119
  • [10] Speaker extraction network with attention mechanism for speech dialogue system
    Yun Hao
    Jiaju Wu
    Xiangkang Huang
    Zijia Zhang
    Fei Liu
    Qingyao Wu
    Service Oriented Computing and Applications, 2022, 16 : 111 - 119