Target Confusion in End-to-end Speaker Extraction: Analysis and Approaches

被引:7
|
作者
Zhao, Zifeng [1 ]
Yang, Dongchao [1 ]
Gu, Rongzhi [1 ]
Zhang, Haoran [1 ]
Zou, Yuexian [1 ]
机构
[1] Peking Univ, Sch ECE, ADSPLAB, Shenzhen, Peoples R China
来源
基金
中国国家自然科学基金;
关键词
speech separation; end-to-end speaker extraction; target confusion problem; metric learning; post-filtering; SPEECH SEPARATION;
D O I
10.21437/Interspeech.2022-176
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Recently, end-to-end speaker extraction has attracted increasing attention and shown promising results. However, its performance is often inferior to that of a blind speech separation (BSS) counterpart with a similar network architecture, due to the auxiliary speaker encoder may sometimes generate ambiguous speaker embeddings. Such ambiguous guidance information may confuse the separation network and hence lead to wrong extraction results, which deteriorates the overall performance. We refer to this as the target confusion problem. In this paper, we conduct an analysis of such an issue and solve it in two stages. In the training phase, we propose to integrate metric learning methods to improve the distinguishability of embeddings produced by the speaker encoder. While for inference, a novel post-filtering strategy is designed to revise wrong results. Specifically, we first identify these confusion samples by measuring the similarities between output estimates and enrollment utterances, after which the true target sources are recovered by a subtraction operation. Experiments show that performance improvement of more than 1 dB SI-SDRi can be brought, which validates the effectiveness of our methods and emphasizes the impact of the target confusion problem(1).
引用
收藏
页码:5333 / 5337
页数:5
相关论文
共 50 条
  • [31] End-to-End Speaker-Attributed ASR with Transformer
    Kanda, Naoyuki
    Ye, Guoli
    Gaur, Yashesh
    Wang, Xiaofei
    Meng, Zhong
    Chen, Zhuo
    Yoshioka, Takuya
    INTERSPEECH 2021, 2021, : 4413 - 4417
  • [32] Adversarial Regularization for End-to-end Robust Speaker Verification
    Wang, Qing
    Guo, Pengcheng
    Sun, Sining
    Xie, Lei
    Hansen, John H. L.
    INTERSPEECH 2019, 2019, : 4010 - 4014
  • [33] Exploring Two Approaches for an End-to-End Scientific Analysis Workflow
    Dodelson, Scott
    Kent, Steve
    Kowalkowski, Jim
    Paterno, Marc
    Sehrish, Saba
    21ST INTERNATIONAL CONFERENCE ON COMPUTING IN HIGH ENERGY AND NUCLEAR PHYSICS (CHEP2015), PARTS 1-9, 2015, 664
  • [34] Approaches to end-to-end ecosystem models
    Fulton, Elizabeth A.
    JOURNAL OF MARINE SYSTEMS, 2010, 81 (1-2) : 171 - 183
  • [35] Can Speaker Augmentation Improve Multi-Speaker End-to-End TTS?
    Cooper, Erica
    Lai, Cheng-, I
    Yasuda, Yusuke
    Yamagishi, Junichi
    INTERSPEECH 2020, 2020, : 3979 - 3983
  • [36] FRAME-LEVEL SPEAKER EMBEDDINGS FOR TEXT-INDEPENDENT SPEAKER RECOGNITION AND ANALYSIS OF END-TO-END MODEL
    Shon, Suwon
    Tang, Hao
    Glass, James
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 1007 - 1013
  • [37] SIAMESE CAPSULE NETWORK FOR END-TO-END SPEAKER RECOGNITION IN THE WILD
    Hajavi, Amirhossein
    Etemad, Ali
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7203 - 7207
  • [38] Improved Relation Networks for End-to-End Speaker Verification and Identification
    Chaubey, Ashutosh
    Sinha, Sparsh
    Ghose, Susmita
    INTERSPEECH 2022, 2022, : 5085 - 5089
  • [39] End-to-end recurrent denoising autoencoder embeddings for speaker identification
    Esther Rituerto-González
    Carmen Peláez-Moreno
    Neural Computing and Applications, 2021, 33 : 14429 - 14439
  • [40] SPEAKER VERIFICATION USING END-TO-END ADVERSARIAL LANGUAGE ADAPTATION
    Rohdin, Johan
    Stafylakis, Themos
    Silnova, Anna
    Zeinali, Hossein
    Burget, Lukas
    Plchot, Oldrich
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6006 - 6010