SPEAKER-CONDITIONING SINGLE-CHANNEL TARGET SPEAKER EXTRACTION USING CONFORMER-BASED ARCHITECTURES

被引:1
|
作者
Sinha, Ragini [1 ]
Tammen, Marvin [2 ,3 ]
Rollwage, Christian [1 ]
Doclo, Simon [1 ,2 ,3 ]
机构
[1] Fraunhofer Inst Digital Media Technol IDMT, Oldenburg Branch Hearing Speech & Audio Technol H, Ilmenau, Germany
[2] Carl von Ossietzky Univ Oldenburg, Dept Med Phys & Acoust, Oldenburg, Germany
[3] Carl von Ossietzky Univ Oldenburg, Cluster Excellence Hearing4all, Oldenburg, Germany
关键词
target speaker extraction; multi-task learning; TCN; attention; conformer;
D O I
10.1109/IWAENC53105.2022.9914691
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Target speaker extraction aims at extracting the target speaker from a mixture of multiple speakers exploiting auxiliary information about the target speaker. In this paper, we consider a complete time-domain target speaker extraction system consisting of a speaker embedder network and a speaker separator network which are jointly trained in an end-to-end learning process. We propose two different architectures for the speaker separator network which are based on the convolutional augmented transformer (conformer). The first architecture uses stacks of conformer and external feed-forward blocks (Conformer-FFN), while the second architecture uses stacks of temporal convolutional network (TCN) and conformer blocks (TCN-Conformer). Experimental results for 2-speaker mixtures, 3-speaker mixtures, and noisy mixtures of 2-speakers show that among the proposed separator networks, the TCN-Conformer significantly improves the target speaker extraction performance compared to the Conformer-FFN and a TCN-based baseline system.
引用
收藏
页数:5
相关论文
共 50 条
  • [41] Target Speaker Extraction Using Attention-Enhanced Temporal Convolutional Network
    Wang, Jian-Hong
    Lai, Yen-Ting
    Tai, Tzu-Chiang
    Le, Phuong Thi
    Pham, Tuan
    Wang, Ze-Yu
    Li, Yung-Hui
    Wang, Jia-Ching
    Chang, Pao-Chi
    Botzheim, Janos
    ELECTRONICS, 2024, 13 (02)
  • [42] SPEAKER REINFORCEMENT USING TARGET SOURCE EXTRACTION FOR ROBUST AUTOMATIC SPEECH RECOGNITION
    Zorila, Catalin
    Doddipatla, Rama
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6297 - 6301
  • [43] Deep Extractor Network for Target Speaker Recovery From Single Channel Speech Mixtures
    Wang, Jun
    Chen, Jie
    Su, Dan
    Chen, Lianwu
    Yu, Meng
    Qian, Yanmin
    Yu, Dong
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 307 - 311
  • [44] Speaker Independent Single Channel Source Separation Using Sinusoidal Features
    Ranjan, Shivesh
    Payton, Karen L.
    Mowlaee, Pejman
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1522 - 1525
  • [45] Coarse-to-Fine Target Speaker Extraction Based on Contextual Information Exploitation
    Yang, Xue
    Bao, Changchun
    Chen, Xianhong
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 3795 - 3810
  • [46] Feasibility of single channel speaker separation based on modulation frequency analysis
    Schimmel, Steven M.
    Atlas, Les E.
    Nie, Kaibao
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 605 - +
  • [47] CASA BASED SUPERVISED SINGLE CHANNEL SPEAKER INDEPENDENT SPEECH SEPARATION
    Rehman, M. Fazal Ur
    Saleem, Nasir
    Nawaz, Asif
    Jan, Sadeeq
    Najam, Zeeshan
    Khattak, M. Irfan
    Ahmed, Sheeraz
    JOURNAL OF MECHANICS OF CONTINUA AND MATHEMATICAL SCIENCES, 2019, 14 (06): : 973 - 984
  • [48] Moving Target Detection and Imaging Using a Single-Channel SAR
    Gaibel, Arid
    Boag, Amir
    2019 IEEE INTERNATIONAL CONFERENCE ON MICROWAVES, ANTENNAS, COMMUNICATIONS AND ELECTRONIC SYSTEMS (COMCAS), 2019,
  • [49] Speaker Identification based on MFSC voice feature extraction using Transformer
    Bao, Liao
    Zuo, Yi
    2023 23RD IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS, ICDMW 2023, 2023, : 1 - 7
  • [50] Speaker-independent model-based single channel speech separation
    Radfar, M. H.
    Dansereau, R. M.
    Sayadiyan, A.
    NEUROCOMPUTING, 2008, 72 (1-3) : 71 - 78