DEEP SPEAKER EMBEDDING LEARNING WITH MULTI-LEVEL POOLING FOR TEXT-INDEPENDENT SPEAKER VERIFICATION

被引:0
|
作者
Tang, Yun [1 ]
Ding, Guohong [1 ]
Huang, Jing [1 ]
He, Xiaodong [1 ]
Zhou, Bowen [1 ]
机构
[1] JD AI Res, 675 East Middlefield Rd, Mountain View, CA 94043 USA
关键词
Speaker recognition; x-vector; multi-level pooling;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper aims to improve the widely used deep speaker embedding x-vector model. We propose the following improvements: ( 1) a hybrid neural network structure using both time delay neural network ( TDNN) and long short-term memory neural networks ( LSTM) to generate complementary speaker information at different levels; ( 2) a multi-level pooling strategy to collect speaker information from both TDNN and LSTM layers; ( 3) a regularization scheme on the speaker embedding extraction layer to make the extracted embeddings suitable for the following fusion step. The synergy of these improvements are shown on the NIST SRE 2016 eval test ( with a 19% EER reduction) and SRE 2018 dev test ( with a 9% EER reduction), as well as more than 10% DCF scores reduction on these two test sets over the x-vector baseline.
引用
收藏
页码:6116 / 6120
页数:5
相关论文
共 50 条
  • [11] Vector-Based Attentive Pooling for Text-Independent Speaker Verification
    Wu, Yanfeng
    Guo, Chenkai
    Gao, Hongcan
    Hou, Xiaolei
    Xu, Jing
    INTERSPEECH 2020, 2020, : 936 - 940
  • [12] A tutorial on text-independent speaker verification
    Bimbot, F. (bimbot@irisa.fr), 1600, Hindawi Publishing Corporation (2004):
  • [13] A CORRECTIVE LEARNING APPROACH FOR TEXT-INDEPENDENT SPEAKER VERIFICATION
    Wen, Yandong
    Zhou, Tianyan
    Singh, Rita
    Raj, Bhiksha
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4894 - 4898
  • [14] Mixup Learning Strategies for Text-independent Speaker Verification
    Zhu, Yingke
    Ko, Tom
    Mak, Brian
    INTERSPEECH 2019, 2019, : 4345 - 4349
  • [15] A tutorial on text-independent speaker verification
    Bimbot, F
    Bonastre, JF
    Fredouille, C
    Gravier, G
    Magrin-Chagnolleau, I
    Meignier, S
    Merlin, T
    Ortega-García, J
    Petrovska-Delacrétaz, D
    Reynolds, DA
    EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING, 2004, 2004 (04) : 430 - 451
  • [16] A Tutorial on Text-Independent Speaker Verification
    Frédéric Bimbot
    Jean-François Bonastre
    Corinne Fredouille
    Guillaume Gravier
    Ivan Magrin-Chagnolleau
    Sylvain Meignier
    Teva Merlin
    Javier Ortega-García
    Dijana Petrovska-Delacrétaz
    Douglas A. Reynolds
    EURASIP Journal on Advances in Signal Processing, 2004
  • [17] Deeply Fused Speaker Embeddings for Text-Independent Speaker Verification
    Bhattacharya, Gautam
    Alam, Jahangir
    Gupta, Vishwa
    Kenny, Patrick
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3588 - 3592
  • [18] Discriminative Neural Embedding Learning for Short-Duration Text-Independent Speaker Verification
    Wang, Shuai
    Huang, Zili
    Qian, Yanmin
    Yu, Kai
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (11) : 1686 - 1696
  • [19] HYBRID NETWORK WITH MULTI-LEVEL GLOBAL-LOCAL STATISTICS POOLING FOR ROBUST TEXT-INDEPENDENT SPEAKER RECOGNITION
    Kang, Woo Hyun
    Alam, Jahangir
    Fathan, Abderrahim
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 1116 - 1123
  • [20] A ROBUST TEXT-INDEPENDENT SPEAKER VERIFICATION METHOD BASED ON SPEECH SEPARATION AND DEEP SPEAKER
    Zhao, Fei
    Li, Hao
    Zhang, Xueliang
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6101 - 6105