PARTIAL AUC OPTIMIZATION BASED DEEP SPEAKER EMBEDDINGS WITH CLASS-CENTER LEARNING FOR TEXT-INDEPENDENT SPEAKER VERIFICATION

被引:0
|
作者
Bai, Zhongxin [1 ,2 ]
Zhang, Xiao-Lei [1 ,2 ]
Chen, Jingdong [1 ,2 ]
机构
[1] Northwestern Polytech Univ, Ctr Intelligent Acoust & Immers Commun, Xian, Peoples R China
[2] Northwestern Polytech Univ, Sch Marine Sci & Technol, Xian, Peoples R China
基金
以色列科学基金会; 美国国家科学基金会;
关键词
speaker verification; pAUC optimization; speaker centers; verification loss; RECOGNITION;
D O I
10.1109/icassp40776.2020.9053674
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Deep embedding based text-independent speaker verification has demonstrated superior performance to traditional methods in many challenging scenarios. Its loss functions can be generally categorized into two classes, i.e., verification and identification. The verification loss functions match the pipeline of speaker verification, but their implementations are difficult. Thus, most state-of-the-art deep embedding methods use the identification loss functions with softmax output units or their variants. In this paper, we propose a verification loss function, named the maximization of partial area under the Receiver-operating-characteristic (ROC) curve (pAUC), for deep embedding based text-independent speaker verification. We also propose a class-center based training trial construction method to improve the training efficiency, which is critical for the proposed loss function to be comparable to the identification loss in performance. Experiments on the Speaker in the Wild (SITW) and NIST SRE 2016 datasets show that the proposed pAUC loss function is highly competitive with the state-of-the-art identification loss functions.
引用
收藏
页码:6819 / 6823
页数:5
相关论文
共 50 条
  • [31] ORTHOGONAL TRAINING FOR TEXT-INDEPENDENT SPEAKER VERIFICATION
    Zhu, Yingke
    Mak, Brian
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6584 - 6588
  • [32] Speaker adaptive cohort selection for Tnorm in text-independent speaker verification
    Sturim, DE
    Reynolds, DA
    2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 741 - 744
  • [33] Triplet Based Embedding Distance and Similarity Learning for Text-independent Speaker Verification
    Ren, Zongze
    Chen, Zhiyong
    Xu, Shugong
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 558 - 562
  • [34] A Text-Independent Speaker Verification System Based on Cross Entropy
    Lu, Xiaochun
    Yin, Junxun
    COMPUTATIONAL INTELLIGENCE AND INTELLIGENT SYSTEMS, 2009, 51 : 419 - 426
  • [35] Text-independent speaker verification based on relation of MFCC components
    Ou, GW
    Ke, DF
    2004 International Symposium on Chinese Spoken Language Processing, Proceedings, 2004, : 57 - 60
  • [36] Improving the Generalized Performance of Deep Embedding for Text-Independent Speaker Verification
    Li, Rongjin
    Li, Lin
    Hong, Qingyang
    Guo, Huiyang
    Zhao, Miao
    PROCEEDINGS OF 2018 12TH IEEE INTERNATIONAL CONFERENCE ON ANTI-COUNTERFEITING, SECURITY, AND IDENTIFICATION (ASID), 2018, : 21 - 25
  • [37] CONTRASTIVE SELF-SUPERVISED LEARNING FOR TEXT-INDEPENDENT SPEAKER VERIFICATION
    Zhang, Haoran
    Zou, Yuexian
    Wang, Helin
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6713 - 6717
  • [38] Text-Independent Speaker Verification Based on Deep Neural Networks and Segmental Dynamic TimeWarping
    Adel, Mohamed
    Afify, Mohamed
    Gaballah, Akram
    Fayek, Magda
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 1001 - 1006
  • [39] End-to-End Feature Learning for Text-Independent Speaker Verification
    Chen, Fangzhou
    Bian, Tengyue
    Xu, Li
    PROCEEDINGS OF THE 2019 31ST CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2019), 2019, : 3949 - 3954
  • [40] DEEP BOTTLENECK FEATURES FOR I-VECTOR BASED TEXT-INDEPENDENT SPEAKER VERIFICATION
    Ghalehjegh, Sina Hamidi
    Rose, Richard C.
    2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 555 - 560