Mixup Learning Strategies for Text-independent Speaker Verification

被引：20

作者：

Zhu, Yingke ^{[1
]}

Ko, Tom ^{[2
]}

Mak, Brian ^{[1
]}

机构：

[1] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Hong Kong, Peoples R China

[2] South Univ Sci & Technol, Dept Comp Sci & Engn, Shenzhen, Peoples R China

来源：

INTERSPEECH 2019 | 2019年

关键词：

speaker recognition; deep neural networks; mixup; x-vectors;

D O I：

10.21437/Interspeech.2019-2250

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

Mixup is a learning strategy that constructs additional virtual training samples from existing training samples by linearly interpolating random pairs of them. It has been shown that mixup can help avoid data memorization and thus improve model generalization. This paper investigates the mixup learning strategy in training speaker-discriminative deep neural network (DNN) for better text-independent speaker verification. In recent speaker verification systems, a DNN is usually trained to classify speakers in the training set. The DNN, at the same time, learns a low-dimensional embedding of speakers so that speaker embeddings can be generated for any speakers during evaluation. We adapted the mixup strategy to the speaker-discriminative DNN training procedure, and studied different mixup schemes, such as performing mixup on MFCC features or raw audio samples. The mixup learning strategy was evaluated on NIST SRE 2010, 2016 and SITW evaluation sets. Experimental results show consistent performance improvements both in terms of EER and DCF of up to 13% relative. We further find that mixup training also improves the DNN's speaker classification accuracy consistently without requiring any additional data sources.

引用

页码：4345 / 4349

页数：5

共 50 条

[21] Residual Factor Analysis for Text-independent Speaker Verification
Zhu, Lei
Zheng, Rong
Xu, Bo
PROCEEDINGS OF THE 2009 CHINESE CONFERENCE ON PATTERN RECOGNITION AND THE FIRST CJK JOINT WORKSHOP ON PATTERN RECOGNITION, VOLS 1 AND 2, 2009, : 964 - 968
[22] CHANNEL ADAPTATION OF PLDA FOR TEXT-INDEPENDENT SPEAKER VERIFICATION
Chen, Liping
Lee, Kong Aik
Ma, Bin
Guo, Wu
Li, Haizhou
Dai, Li Rong
2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 5251 - 5255
[23] Neural Embedding Extractors for Text-Independent Speaker Verification
Alam, Jahangir
Kang, Woohyun
Fathan, Abderrahim
SPEECH AND COMPUTER, SPECOM 2022, 2022, 13721 : 10 - 23
[24] A text-independent speaker verification model: A comparative analysis
Charan, Rishi
Manisha, A.
Karthik, R.
Kumar, Rajesh M.
PROCEEDINGS OF 2017 INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND CONTROL (I2C2), 2017,
[25] CNN WITH PHONETIC ATTENTION FOR TEXT-INDEPENDENT SPEAKER VERIFICATION
Zhou, Tianyan
Zhao, Yong
Li, Jinyu
Gong, Yifan
Wu, Jian
2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 718 - 725
[26] Text-Independent Speaker Verification with Dual Attention Network
Li, Jingyu
Lee, Tan
INTERSPEECH 2020, 2020, : 956 - 960
[27] Influence of task duration in text-independent speaker verification
Fauve, Benoit
Evans, Nicholas
Pearson, Neil
Bonastre, Jean-Francois
Mason, John
INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2728 - +
[28] Score normalization for text-independent speaker verification systems
Auckenthaler, R
Carey, M
Lloyd-Thomas, H
DIGITAL SIGNAL PROCESSING, 2000, 10 (1-3) : 42 - 54
[29] Exploration of Local Variability in Text-Independent Speaker Verification
Chen, Liping
Lee, Kong Aik
Ma, Bin
Guo, Wu
Li, Haizhou
Dai, Li-Rong
JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2016, 82 (02): : 217 - 228
[30] A robust sequential test for text-independent speaker verification
Lund, MA
Lee, CC
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1996, 99 (01): : 609 - 621

← 1 2 3 4 5 →