Mixup Learning Strategies for Text-independent Speaker Verification

被引:20
|
作者
Zhu, Yingke [1 ]
Ko, Tom [2 ]
Mak, Brian [1 ]
机构
[1] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Hong Kong, Peoples R China
[2] South Univ Sci & Technol, Dept Comp Sci & Engn, Shenzhen, Peoples R China
来源
关键词
speaker recognition; deep neural networks; mixup; x-vectors;
D O I
10.21437/Interspeech.2019-2250
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Mixup is a learning strategy that constructs additional virtual training samples from existing training samples by linearly interpolating random pairs of them. It has been shown that mixup can help avoid data memorization and thus improve model generalization. This paper investigates the mixup learning strategy in training speaker-discriminative deep neural network (DNN) for better text-independent speaker verification. In recent speaker verification systems, a DNN is usually trained to classify speakers in the training set. The DNN, at the same time, learns a low-dimensional embedding of speakers so that speaker embeddings can be generated for any speakers during evaluation. We adapted the mixup strategy to the speaker-discriminative DNN training procedure, and studied different mixup schemes, such as performing mixup on MFCC features or raw audio samples. The mixup learning strategy was evaluated on NIST SRE 2010, 2016 and SITW evaluation sets. Experimental results show consistent performance improvements both in terms of EER and DCF of up to 13% relative. We further find that mixup training also improves the DNN's speaker classification accuracy consistently without requiring any additional data sources.
引用
收藏
页码:4345 / 4349
页数:5
相关论文
共 50 条
  • [21] Residual Factor Analysis for Text-independent Speaker Verification
    Zhu, Lei
    Zheng, Rong
    Xu, Bo
    PROCEEDINGS OF THE 2009 CHINESE CONFERENCE ON PATTERN RECOGNITION AND THE FIRST CJK JOINT WORKSHOP ON PATTERN RECOGNITION, VOLS 1 AND 2, 2009, : 964 - 968
  • [22] CHANNEL ADAPTATION OF PLDA FOR TEXT-INDEPENDENT SPEAKER VERIFICATION
    Chen, Liping
    Lee, Kong Aik
    Ma, Bin
    Guo, Wu
    Li, Haizhou
    Dai, Li Rong
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 5251 - 5255
  • [23] Neural Embedding Extractors for Text-Independent Speaker Verification
    Alam, Jahangir
    Kang, Woohyun
    Fathan, Abderrahim
    SPEECH AND COMPUTER, SPECOM 2022, 2022, 13721 : 10 - 23
  • [24] A text-independent speaker verification model: A comparative analysis
    Charan, Rishi
    Manisha, A.
    Karthik, R.
    Kumar, Rajesh M.
    PROCEEDINGS OF 2017 INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND CONTROL (I2C2), 2017,
  • [25] CNN WITH PHONETIC ATTENTION FOR TEXT-INDEPENDENT SPEAKER VERIFICATION
    Zhou, Tianyan
    Zhao, Yong
    Li, Jinyu
    Gong, Yifan
    Wu, Jian
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 718 - 725
  • [26] Text-Independent Speaker Verification with Dual Attention Network
    Li, Jingyu
    Lee, Tan
    INTERSPEECH 2020, 2020, : 956 - 960
  • [27] Influence of task duration in text-independent speaker verification
    Fauve, Benoit
    Evans, Nicholas
    Pearson, Neil
    Bonastre, Jean-Francois
    Mason, John
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2728 - +
  • [28] Score normalization for text-independent speaker verification systems
    Auckenthaler, R
    Carey, M
    Lloyd-Thomas, H
    DIGITAL SIGNAL PROCESSING, 2000, 10 (1-3) : 42 - 54
  • [29] Exploration of Local Variability in Text-Independent Speaker Verification
    Chen, Liping
    Lee, Kong Aik
    Ma, Bin
    Guo, Wu
    Li, Haizhou
    Dai, Li-Rong
    JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2016, 82 (02): : 217 - 228
  • [30] A robust sequential test for text-independent speaker verification
    Lund, MA
    Lee, CC
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1996, 99 (01): : 609 - 621