SPEAKER-CONDITIONING SINGLE-CHANNEL TARGET SPEAKER EXTRACTION USING CONFORMER-BASED ARCHITECTURES

被引:1
|
作者
Sinha, Ragini [1 ]
Tammen, Marvin [2 ,3 ]
Rollwage, Christian [1 ]
Doclo, Simon [1 ,2 ,3 ]
机构
[1] Fraunhofer Inst Digital Media Technol IDMT, Oldenburg Branch Hearing Speech & Audio Technol H, Ilmenau, Germany
[2] Carl von Ossietzky Univ Oldenburg, Dept Med Phys & Acoust, Oldenburg, Germany
[3] Carl von Ossietzky Univ Oldenburg, Cluster Excellence Hearing4all, Oldenburg, Germany
关键词
target speaker extraction; multi-task learning; TCN; attention; conformer;
D O I
10.1109/IWAENC53105.2022.9914691
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Target speaker extraction aims at extracting the target speaker from a mixture of multiple speakers exploiting auxiliary information about the target speaker. In this paper, we consider a complete time-domain target speaker extraction system consisting of a speaker embedder network and a speaker separator network which are jointly trained in an end-to-end learning process. We propose two different architectures for the speaker separator network which are based on the convolutional augmented transformer (conformer). The first architecture uses stacks of conformer and external feed-forward blocks (Conformer-FFN), while the second architecture uses stacks of temporal convolutional network (TCN) and conformer blocks (TCN-Conformer). Experimental results for 2-speaker mixtures, 3-speaker mixtures, and noisy mixtures of 2-speakers show that among the proposed separator networks, the TCN-Conformer significantly improves the target speaker extraction performance compared to the Conformer-FFN and a TCN-based baseline system.
引用
收藏
页数:5
相关论文
共 50 条
  • [1] Variants of LSTM cells for single-channel speaker-conditioned target speaker extraction
    Ragini Sinha
    Christian Rollwage
    Simon Doclo
    EURASIP Journal on Audio, Speech, and Music Processing, 2024 (1)
  • [2] Single-Channel Target Speaker Extraction System with Attention Enhancement
    Lai, Yen-Ting
    Lin, Yi-En
    Chang, Pao-Chi
    Wang, Jia-Ching
    2022 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS - TAIWAN, IEEE ICCE-TW 2022, 2022, : 433 - 434
  • [3] SINGLE CHANNEL TARGET SPEAKER EXTRACTION AND RECOGNITION WITH SPEAKER BEAM
    Delcroix, Marc
    Zmolikova, Katerina
    Kinoshita, Keisuke
    Ogawa, Atsunori
    Nakatani, Tomohiro
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5554 - 5558
  • [4] SINGLE-CHANNEL SPEECH EXTRACTION USING SPEAKER INVENTORY AND ATTENTION NETWORK
    Xiao, Xiong
    Chen, Zhuo
    Yoshioka, Takuya
    Erdogan, Hakan
    Liu, Changliang
    Dimitriadis, Dimitrios
    Droppo, Jasha
    Gong, Yifan
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 86 - 90
  • [5] SINGLE-CHANNEL SPEAKER DIARIZATION BASED ON SPATIAL FEATURES
    Hu, Mathieu
    Parada, Pablo Peso
    Sharma, Dushyant
    Doclo, Simon
    van Waterschoot, Toon
    Brookes, Mike
    Naylor, Patrick A.
    2015 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), 2015,
  • [6] Conformer-Based Speaker Recognition Model for Real-Time Multi-Scenarios
    Xuan, Xi
    Han, Runping
    Gao, Jingxin
    Computer Engineering and Applications, 2024, 60 (07) : 147 - 156
  • [7] Speaker Separation Using Visual Speech Features and Single-channel Audio
    Khan, Faheem
    Milner, Ben
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 3263 - 3267
  • [8] Single-Channel Multi-Speaker Separation using Deep Clustering
    Isik, Yusuf
    Le Roux, Jonathan
    Chen, Zhuo
    Watanabe, Shinji
    Hershey, John R.
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 545 - 549
  • [9] Speaker Verification-Based Evaluation of Single-Channel Speech Separation
    Maciejewski, Matthew
    Watanabe, Shinji
    Khudanpur, Sanjeev
    INTERSPEECH 2021, 2021, : 3520 - 3524
  • [10] Soft mask methods for single-channel speaker separation
    Reddy, Aarthi M.
    Raj, Bhiksha
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (06): : 1766 - 1776