Mixup Learning Strategies for Text-independent Speaker Verification

被引:20
|
作者
Zhu, Yingke [1 ]
Ko, Tom [2 ]
Mak, Brian [1 ]
机构
[1] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Hong Kong, Peoples R China
[2] South Univ Sci & Technol, Dept Comp Sci & Engn, Shenzhen, Peoples R China
来源
关键词
speaker recognition; deep neural networks; mixup; x-vectors;
D O I
10.21437/Interspeech.2019-2250
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Mixup is a learning strategy that constructs additional virtual training samples from existing training samples by linearly interpolating random pairs of them. It has been shown that mixup can help avoid data memorization and thus improve model generalization. This paper investigates the mixup learning strategy in training speaker-discriminative deep neural network (DNN) for better text-independent speaker verification. In recent speaker verification systems, a DNN is usually trained to classify speakers in the training set. The DNN, at the same time, learns a low-dimensional embedding of speakers so that speaker embeddings can be generated for any speakers during evaluation. We adapted the mixup strategy to the speaker-discriminative DNN training procedure, and studied different mixup schemes, such as performing mixup on MFCC features or raw audio samples. The mixup learning strategy was evaluated on NIST SRE 2010, 2016 and SITW evaluation sets. Experimental results show consistent performance improvements both in terms of EER and DCF of up to 13% relative. We further find that mixup training also improves the DNN's speaker classification accuracy consistently without requiring any additional data sources.
引用
收藏
页码:4345 / 4349
页数:5
相关论文
共 50 条
  • [1] Deep Speaker Feature Learning for Text-independent Speaker Verification
    Li, Lantian
    Chen, Yixiang
    Shi, Zing
    Tang, Zhiyuan
    Wang, Dong
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1542 - 1546
  • [2] A CORRECTIVE LEARNING APPROACH FOR TEXT-INDEPENDENT SPEAKER VERIFICATION
    Wen, Yandong
    Zhou, Tianyan
    Singh, Rita
    Raj, Bhiksha
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4894 - 4898
  • [3] Strategies for End-to-End Text-Independent Speaker Verification
    Lin, Weiwei
    Mak, Man-Wai
    Chien, Jen-Tzung
    INTERSPEECH 2020, 2020, : 4308 - 4312
  • [4] A tutorial on text-independent speaker verification
    Bimbot, F. (bimbot@irisa.fr), 1600, Hindawi Publishing Corporation (2004):
  • [5] A tutorial on text-independent speaker verification
    Bimbot, F
    Bonastre, JF
    Fredouille, C
    Gravier, G
    Magrin-Chagnolleau, I
    Meignier, S
    Merlin, T
    Ortega-García, J
    Petrovska-Delacrétaz, D
    Reynolds, DA
    EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING, 2004, 2004 (04) : 430 - 451
  • [6] A Tutorial on Text-Independent Speaker Verification
    Frédéric Bimbot
    Jean-François Bonastre
    Corinne Fredouille
    Guillaume Gravier
    Ivan Magrin-Chagnolleau
    Sylvain Meignier
    Teva Merlin
    Javier Ortega-García
    Dijana Petrovska-Delacrétaz
    Douglas A. Reynolds
    EURASIP Journal on Advances in Signal Processing, 2004
  • [7] TEXT-INDEPENDENT SPEAKER VERIFICATION WITH ADVERSARIAL LEARNING ON SHORT UTTERANCES
    Liu, Kai
    Zhou, Huan
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6569 - 6573
  • [8] Text-Independent Speaker Verification Based on Information Theoretic Learning
    Memon, Sheeraz
    Khanzada, Tariq Jameel Saifullah
    Bhatti, Sania
    MEHRAN UNIVERSITY RESEARCH JOURNAL OF ENGINEERING AND TECHNOLOGY, 2011, 30 (03) : 457 - 468
  • [9] Graphical models for text-independent speaker verification
    Sánchez-Soto, E
    Sigelle, M
    Chollet, G
    NONLINEAR SPEECH MODELING AND APPLICATIONS, 2005, 3445 : 410 - 415
  • [10] Language dependency in text-independent speaker verification
    Auckenthaler, R
    Carey, MJ
    Mason, JSD
    2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING, 2001, : 441 - 444