DEEP SPEAKER EMBEDDING LEARNING WITH MULTI-LEVEL POOLING FOR TEXT-INDEPENDENT SPEAKER VERIFICATION

被引:0
|
作者
Tang, Yun [1 ]
Ding, Guohong [1 ]
Huang, Jing [1 ]
He, Xiaodong [1 ]
Zhou, Bowen [1 ]
机构
[1] JD AI Res, 675 East Middlefield Rd, Mountain View, CA 94043 USA
关键词
Speaker recognition; x-vector; multi-level pooling;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper aims to improve the widely used deep speaker embedding x-vector model. We propose the following improvements: ( 1) a hybrid neural network structure using both time delay neural network ( TDNN) and long short-term memory neural networks ( LSTM) to generate complementary speaker information at different levels; ( 2) a multi-level pooling strategy to collect speaker information from both TDNN and LSTM layers; ( 3) a regularization scheme on the speaker embedding extraction layer to make the extracted embeddings suitable for the following fusion step. The synergy of these improvements are shown on the NIST SRE 2016 eval test ( with a 19% EER reduction) and SRE 2018 dev test ( with a 9% EER reduction), as well as more than 10% DCF scores reduction on these two test sets over the x-vector baseline.
引用
收藏
页码:6116 / 6120
页数:5
相关论文
共 50 条
  • [1] Deep Speaker Feature Learning for Text-independent Speaker Verification
    Li, Lantian
    Chen, Yixiang
    Shi, Zing
    Tang, Zhiyuan
    Wang, Dong
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1542 - 1546
  • [2] Deep Speaker Embedding with Long Short Term Centroid Learning for Text-independent Speaker Verification
    Peng, Junyi
    Gu, Rongzhi
    Zou, Yuexian
    INTERSPEECH 2020, 2020, : 3246 - 3250
  • [3] On Metric-based Deep Embedding Learning for Text-Independent Speaker Verification
    Kashani, Hamidreza Baradaran
    Reza, Shaghayegh
    Rezaei, Iman Sarraf
    2020 6TH IRANIAN CONFERENCE ON SIGNAL PROCESSING AND INTELLIGENT SYSTEMS (ICSPIS), 2020,
  • [4] Deep multi-metric learning for text-independent speaker verification
    Xu, Jiwei
    Wang, Xinggang
    Feng, Bin
    Liu, Wenyu
    NEUROCOMPUTING, 2020, 410 : 394 - 400
  • [5] Improving the Generalized Performance of Deep Embedding for Text-Independent Speaker Verification
    Li, Rongjin
    Li, Lin
    Hong, Qingyang
    Guo, Huiyang
    Zhao, Miao
    PROCEEDINGS OF 2018 12TH IEEE INTERNATIONAL CONFERENCE ON ANTI-COUNTERFEITING, SECURITY, AND IDENTIFICATION (ASID), 2018, : 21 - 25
  • [6] Neural Embedding Extractors for Text-Independent Speaker Verification
    Alam, Jahangir
    Kang, Woohyun
    Fathan, Abderrahim
    SPEECH AND COMPUTER, SPECOM 2022, 2022, 13721 : 10 - 23
  • [7] A Study on Angular Based Embedding Learning for Text-independent Speaker Verification
    Chen, Zhiyong
    Ren, Zongze
    Xu, Shugong
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 445 - 449
  • [8] Sequential Speaker Embedding and Transfer Learning for Text-Independent Speaker Identification
    Hong, Qian-Bei
    Wu, Chung-Hsien
    Su, Ming-Hsiang
    Wang, Hsin-Min
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 827 - 832
  • [9] Triplet Based Embedding Distance and Similarity Learning for Text-independent Speaker Verification
    Ren, Zongze
    Chen, Zhiyong
    Xu, Shugong
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 558 - 562
  • [10] DeltaVLAD: An efficient optimization algorithm to discriminate speaker embedding for text-independent speaker verification
    Guo, Xin
    Luo, Chengfang
    Deng, Aiwen
    Deng, Feiqi
    AIMS MATHEMATICS, 2022, 7 (04): : 6381 - 6395