DEEP SPEAKER EMBEDDING LEARNING WITH MULTI-LEVEL POOLING FOR TEXT-INDEPENDENT SPEAKER VERIFICATION

被引：0

作者：

Tang, Yun ^{[1
]}

Ding, Guohong ^{[1
]}

Huang, Jing ^{[1
]}

He, Xiaodong ^{[1
]}

Zhou, Bowen ^{[1
]}

机构：

[1] JD AI Res, 675 East Middlefield Rd, Mountain View, CA 94043 USA

来源：

2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2019年

关键词：

Speaker recognition; x-vector; multi-level pooling;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper aims to improve the widely used deep speaker embedding x-vector model. We propose the following improvements: ( 1) a hybrid neural network structure using both time delay neural network ( TDNN) and long short-term memory neural networks ( LSTM) to generate complementary speaker information at different levels; ( 2) a multi-level pooling strategy to collect speaker information from both TDNN and LSTM layers; ( 3) a regularization scheme on the speaker embedding extraction layer to make the extracted embeddings suitable for the following fusion step. The synergy of these improvements are shown on the NIST SRE 2016 eval test ( with a 19% EER reduction) and SRE 2018 dev test ( with a 9% EER reduction), as well as more than 10% DCF scores reduction on these two test sets over the x-vector baseline.

引用

页码：6116 / 6120

页数：5

共 50 条

[1] Deep Speaker Feature Learning for Text-independent Speaker Verification
Li, Lantian
Chen, Yixiang
Shi, Zing
Tang, Zhiyuan
Wang, Dong
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1542 - 1546
[2] Deep Speaker Embedding with Long Short Term Centroid Learning for Text-independent Speaker Verification
Peng, Junyi
Gu, Rongzhi
Zou, Yuexian
INTERSPEECH 2020, 2020, : 3246 - 3250
[3] On Metric-based Deep Embedding Learning for Text-Independent Speaker Verification
Kashani, Hamidreza Baradaran
Reza, Shaghayegh
Rezaei, Iman Sarraf
2020 6TH IRANIAN CONFERENCE ON SIGNAL PROCESSING AND INTELLIGENT SYSTEMS (ICSPIS), 2020,
[4] Deep multi-metric learning for text-independent speaker verification
Xu, Jiwei
Wang, Xinggang
Feng, Bin
Liu, Wenyu
NEUROCOMPUTING, 2020, 410 : 394 - 400
[5] Improving the Generalized Performance of Deep Embedding for Text-Independent Speaker Verification
Li, Rongjin
Li, Lin
Hong, Qingyang
Guo, Huiyang
Zhao, Miao
PROCEEDINGS OF 2018 12TH IEEE INTERNATIONAL CONFERENCE ON ANTI-COUNTERFEITING, SECURITY, AND IDENTIFICATION (ASID), 2018, : 21 - 25
[6] Neural Embedding Extractors for Text-Independent Speaker Verification
Alam, Jahangir
Kang, Woohyun
Fathan, Abderrahim
SPEECH AND COMPUTER, SPECOM 2022, 2022, 13721 : 10 - 23
[7] A Study on Angular Based Embedding Learning for Text-independent Speaker Verification
Chen, Zhiyong
Ren, Zongze
Xu, Shugong
2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 445 - 449
[8] Sequential Speaker Embedding and Transfer Learning for Text-Independent Speaker Identification
Hong, Qian-Bei
Wu, Chung-Hsien
Su, Ming-Hsiang
Wang, Hsin-Min
2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 827 - 832
[9] Triplet Based Embedding Distance and Similarity Learning for Text-independent Speaker Verification
Ren, Zongze
Chen, Zhiyong
Xu, Shugong
2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 558 - 562
[10] DeltaVLAD: An efficient optimization algorithm to discriminate speaker embedding for text-independent speaker verification
Guo, Xin
Luo, Chengfang
Deng, Aiwen
Deng, Feiqi
AIMS MATHEMATICS, 2022, 7 (04): : 6381 - 6395

← 1 2 3 4 5 →