SPEAKER-CONDITIONING SINGLE-CHANNEL TARGET SPEAKER EXTRACTION USING CONFORMER-BASED ARCHITECTURES

被引：1

作者：

Sinha, Ragini ^{[1
]}

Tammen, Marvin ^{[2
,3
]}

Rollwage, Christian ^{[1
]}

Doclo, Simon ^{[1
,2
,3
]}

机构：

[1] Fraunhofer Inst Digital Media Technol IDMT, Oldenburg Branch Hearing Speech & Audio Technol H, Ilmenau, Germany

[2] Carl von Ossietzky Univ Oldenburg, Dept Med Phys & Acoust, Oldenburg, Germany

[3] Carl von Ossietzky Univ Oldenburg, Cluster Excellence Hearing4all, Oldenburg, Germany

来源：

2022 INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC 2022) | 2022年

关键词：

target speaker extraction; multi-task learning; TCN; attention; conformer;

D O I：

10.1109/IWAENC53105.2022.9914691

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Target speaker extraction aims at extracting the target speaker from a mixture of multiple speakers exploiting auxiliary information about the target speaker. In this paper, we consider a complete time-domain target speaker extraction system consisting of a speaker embedder network and a speaker separator network which are jointly trained in an end-to-end learning process. We propose two different architectures for the speaker separator network which are based on the convolutional augmented transformer (conformer). The first architecture uses stacks of conformer and external feed-forward blocks (Conformer-FFN), while the second architecture uses stacks of temporal convolutional network (TCN) and conformer blocks (TCN-Conformer). Experimental results for 2-speaker mixtures, 3-speaker mixtures, and noisy mixtures of 2-speakers show that among the proposed separator networks, the TCN-Conformer significantly improves the target speaker extraction performance compared to the Conformer-FFN and a TCN-based baseline system.

引用

页数：5

共 50 条

[41] Target Speaker Extraction Using Attention-Enhanced Temporal Convolutional Network
Wang, Jian-Hong
Lai, Yen-Ting
Tai, Tzu-Chiang
Le, Phuong Thi
Pham, Tuan
Wang, Ze-Yu
Li, Yung-Hui
Wang, Jia-Ching
Chang, Pao-Chi
Botzheim, Janos
ELECTRONICS, 2024, 13 (02)
[42] SPEAKER REINFORCEMENT USING TARGET SOURCE EXTRACTION FOR ROBUST AUTOMATIC SPEECH RECOGNITION
Zorila, Catalin
Doddipatla, Rama
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6297 - 6301
[43] Deep Extractor Network for Target Speaker Recovery From Single Channel Speech Mixtures
Wang, Jun
Chen, Jie
Su, Dan
Chen, Lianwu
Yu, Meng
Qian, Yanmin
Yu, Dong
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 307 - 311
[44] Speaker Independent Single Channel Source Separation Using Sinusoidal Features
Ranjan, Shivesh
Payton, Karen L.
Mowlaee, Pejman
13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1522 - 1525
[45] Coarse-to-Fine Target Speaker Extraction Based on Contextual Information Exploitation
Yang, Xue
Bao, Changchun
Chen, Xianhong
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 3795 - 3810
[46] Feasibility of single channel speaker separation based on modulation frequency analysis
Schimmel, Steven M.
Atlas, Les E.
Nie, Kaibao
2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 605 - +
[47] CASA BASED SUPERVISED SINGLE CHANNEL SPEAKER INDEPENDENT SPEECH SEPARATION
Rehman, M. Fazal Ur
Saleem, Nasir
Nawaz, Asif
Jan, Sadeeq
Najam, Zeeshan
Khattak, M. Irfan
Ahmed, Sheeraz
JOURNAL OF MECHANICS OF CONTINUA AND MATHEMATICAL SCIENCES, 2019, 14 (06): : 973 - 984
[48] Moving Target Detection and Imaging Using a Single-Channel SAR
Gaibel, Arid
Boag, Amir
2019 IEEE INTERNATIONAL CONFERENCE ON MICROWAVES, ANTENNAS, COMMUNICATIONS AND ELECTRONIC SYSTEMS (COMCAS), 2019,
[49] Speaker Identification based on MFSC voice feature extraction using Transformer
Bao, Liao
Zuo, Yi
2023 23RD IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS, ICDMW 2023, 2023, : 1 - 7
[50] Speaker-independent model-based single channel speech separation
Radfar, M. H.
Dansereau, R. M.
Sayadiyan, A.
NEUROCOMPUTING, 2008, 72 (1-3) : 71 - 78

← 1 2 3 4 5 →