DEEP SPEAKER EMBEDDING LEARNING WITH MULTI-LEVEL POOLING FOR TEXT-INDEPENDENT SPEAKER VERIFICATION

被引：0

作者：

Tang, Yun ^{[1
]}

Ding, Guohong ^{[1
]}

Huang, Jing ^{[1
]}

He, Xiaodong ^{[1
]}

Zhou, Bowen ^{[1
]}

机构：

[1] JD AI Res, 675 East Middlefield Rd, Mountain View, CA 94043 USA

来源：

2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2019年

关键词：

Speaker recognition; x-vector; multi-level pooling;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper aims to improve the widely used deep speaker embedding x-vector model. We propose the following improvements: ( 1) a hybrid neural network structure using both time delay neural network ( TDNN) and long short-term memory neural networks ( LSTM) to generate complementary speaker information at different levels; ( 2) a multi-level pooling strategy to collect speaker information from both TDNN and LSTM layers; ( 3) a regularization scheme on the speaker embedding extraction layer to make the extracted embeddings suitable for the following fusion step. The synergy of these improvements are shown on the NIST SRE 2016 eval test ( with a 19% EER reduction) and SRE 2018 dev test ( with a 9% EER reduction), as well as more than 10% DCF scores reduction on these two test sets over the x-vector baseline.

引用

页码：6116 / 6120

页数：5

共 50 条

[11] Vector-Based Attentive Pooling for Text-Independent Speaker Verification
Wu, Yanfeng
Guo, Chenkai
Gao, Hongcan
Hou, Xiaolei
Xu, Jing
INTERSPEECH 2020, 2020, : 936 - 940
[12] A tutorial on text-independent speaker verification
Bimbot, F. (bimbot@irisa.fr), 1600, Hindawi Publishing Corporation (2004):
[13] A CORRECTIVE LEARNING APPROACH FOR TEXT-INDEPENDENT SPEAKER VERIFICATION
Wen, Yandong
Zhou, Tianyan
Singh, Rita
Raj, Bhiksha
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4894 - 4898
[14] Mixup Learning Strategies for Text-independent Speaker Verification
Zhu, Yingke
Ko, Tom
Mak, Brian
INTERSPEECH 2019, 2019, : 4345 - 4349
[15] A tutorial on text-independent speaker verification
Bimbot, F
Bonastre, JF
Fredouille, C
Gravier, G
Magrin-Chagnolleau, I
Meignier, S
Merlin, T
Ortega-García, J
Petrovska-Delacrétaz, D
Reynolds, DA
EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING, 2004, 2004 (04) : 430 - 451
[16] A Tutorial on Text-Independent Speaker Verification
Frédéric Bimbot
Jean-François Bonastre
Corinne Fredouille
Guillaume Gravier
Ivan Magrin-Chagnolleau
Sylvain Meignier
Teva Merlin
Javier Ortega-García
Dijana Petrovska-Delacrétaz
Douglas A. Reynolds
EURASIP Journal on Advances in Signal Processing, 2004
[17] Deeply Fused Speaker Embeddings for Text-Independent Speaker Verification
Bhattacharya, Gautam
Alam, Jahangir
Gupta, Vishwa
Kenny, Patrick
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3588 - 3592
[18] Discriminative Neural Embedding Learning for Short-Duration Text-Independent Speaker Verification
Wang, Shuai
Huang, Zili
Qian, Yanmin
Yu, Kai
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (11) : 1686 - 1696
[19] HYBRID NETWORK WITH MULTI-LEVEL GLOBAL-LOCAL STATISTICS POOLING FOR ROBUST TEXT-INDEPENDENT SPEAKER RECOGNITION
Kang, Woo Hyun
Alam, Jahangir
Fathan, Abderrahim
2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 1116 - 1123
[20] A ROBUST TEXT-INDEPENDENT SPEAKER VERIFICATION METHOD BASED ON SPEECH SEPARATION AND DEEP SPEAKER
Zhao, Fei
Li, Hao
Zhang, Xueliang
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6101 - 6105

← 1 2 3 4 5 →