Target Speaker Extraction with Attention Enhancement and Gated Fusion Mechanism

被引：1

作者：

Wang Sijie ^{[1
,2
]}

Hamdulla, Askar ^{[1
,2
]}

Ablimit, Mijit ^{[1
,2
]}

机构：

[1] Xinjiang Univ, Sch Informat Sci & Engn, Urumqi, Peoples R China

[2] Key Lab Signal Detect & Proc, Urumqi, Peoples R China

来源：

2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC | 2023年

关键词：

target speaker extraction; attention; gated fusion; multi-task learning; NETWORK;

D O I：

10.1109/APSIPAASC58517.2023.10317106

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The objective of a target speaker extraction system is to extract the speech of the target speaker from a mixture of multiple speakers and noises using a certain amount of additional information of the target speaker. In this paper, we investigate the improvements of the baseline system by incorporating the light-weight CBAM module in the target extractor, and the gated fusion module (GFM) in the fusion layer. The CBAM introduces attention enhancement to baseline model with no significant increase in the number of parameters and complexity, and the previous concatenation-based fusion method used for speaker embedding and input mixture (or intermediate output) is replaced by GFM, enabling the model to better leverage the supplementary information provided by speaker embedding. Experimental results on datasets built from WSJ0-2mix and WHAM! demonstrate that both the CBAM module and the light-weight GFM module individually improve the model performance, and the GFM module shows better improvement on WHAM!. However, the combination of these two modules only exhibits mutually beneficial effects on the clean dataset WSJ0-2mix, while the performance of the combined module on the noisy dataset WHAM! is inferior to that of using the GFM module alone.

引用

页码：1995 / 2001

页数：7

共 50 条

[1] MULTIMODAL ATTENTION FUSION FOR TARGET SPEAKER EXTRACTION
Sato, Hiroshi
Ochiai, Tsubasa
Kinoshita, Keisuke
Delcroix, Marc
Nakatani, Tomohiro
Araki, Shoko
2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 778 - 784
[2] Hierarchic Temporal Convolutional Network with Attention Fusion for Target Speaker Extraction
Chen, Zihao
Qiu, Wenbo
Xu, Haitao
Hu, Ying
PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 827 - 832
[3] Contrastive Learning for Target Speaker Extraction With Attention-Based Fusion
Li, Xiao
Liu, Ruirui
Huang, Huichou
Wu, Qingyao
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 178 - 188
[4] Single-Channel Target Speaker Extraction System with Attention Enhancement
Lai, Yen-Ting
Lin, Yi-En
Chang, Pao-Chi
Wang, Jia-Ching
2022 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS - TAIWAN, IEEE ICCE-TW 2022, 2022, : 433 - 434
[5] Gated Convolutional Fusion for Time-Domain Target Speaker Extraction Network
Liu, Wenjing
Xie, Chuan
INTERSPEECH 2022, 2022, : 5368 - 5372
[6] Binaural Selective Attention Model for Target Speaker Extraction
Meng, Hanyu
Zhang, Qiquan
Zhang, Xiangyu
Sethu, Vidhyasaharan
Ambikairajah, Eliathamby
INTERSPEECH 2024, 2024, : 4323 - 4327
[7] SPEAKER-AWARE TARGET SPEAKER ENHANCEMENT BY JOINTLY LEARNING WITH SPEAKER EMBEDDING EXTRACTION
Ji, Xuan
Yu, Meng
Zhang, Chunlei
Su, Dan
Yu, Tao
Liu, Xiaoyu
Yu, Dong
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7294 - 7298
[8] Gated Dynamic Attention Mechanism towards Aspect Extraction
Cheng M.
Hong Y.
Tang J.
Zhang J.
Zou B.
Yao J.
Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2019, 32 (02): : 184 - 192
[9] Speaker extraction network with attention mechanism for speech dialogue system
Hao, Yun
Wu, Jiaju
Huang, Xiangkang
Zhang, Zijia
Liu, Fei
Wu, Qingyao
SERVICE ORIENTED COMPUTING AND APPLICATIONS, 2022, 16 (02) : 111 - 119
[10] Speaker extraction network with attention mechanism for speech dialogue system
Yun Hao
Jiaju Wu
Xiangkang Huang
Zijia Zhang
Fei Liu
Qingyao Wu
Service Oriented Computing and Applications, 2022, 16 : 111 - 119

← 1 2 3 4 5 →