An Adaptive X-vector Model for Text-independent Speaker Verification

被引：4

作者：

Gu, Bin ^{[1
]}

Guo, Wu ^{[1
]}

Ding, Penguin ^{[1
]}

Ling, Zhenhua ^{[1
]}

Du, Jun ^{[1
]}

机构：

[1] Univ Sci & Technol China, Natl Engn Lab Speech & Language Informat Proc, Hefei, Peoples R China

来源：

INTERSPEECH 2020 | 2020年

基金：

中国国家自然科学基金;

关键词：

Speaker verification; Adaptive convolution; Adaptive batch normalization; Attention mechanism;

D O I：

10.21437/Interspeech.2020-1071

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

In this paper, adaptive mechanisms are applied in deep neural network (DNN) training for x-vector-based text-independent speaker verification. First, adaptive convolutional neural networks (ACNNs) are employed in frame-level embedding layers, where the parameters of the convolution filters are adjusted based on the input features. Compared with conventional CNNs, ACNNs have more flexibility in capturing speaker information. Moreover, we replace conventional batch normalization (BN) with adaptive batch normalization (ABN). By dynamically generating the scaling and shifting parameters in BN, ABN adapts models to the acoustic variability arising from various factors such as channel and environmental noises. Finally, we incorporate these two methods to further improve performance. Experiments are carried out on the speaker in the wild (SITW) and VOiCES databases. The results demonstrate that the proposed methods significantly outperform the original x-vector approach.

引用

页码：1506 / 1510

页数：5

共 50 条

[11] A tutorial on text-independent speaker verification
Bimbot, F
Bonastre, JF
Fredouille, C
Gravier, G
Magrin-Chagnolleau, I
Meignier, S
Merlin, T
Ortega-García, J
Petrovska-Delacrétaz, D
Reynolds, DA
EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING, 2004, 2004 (04) : 430 - 451
[12] A Tutorial on Text-Independent Speaker Verification
Frédéric Bimbot
Jean-François Bonastre
Corinne Fredouille
Guillaume Gravier
Ivan Magrin-Chagnolleau
Sylvain Meignier
Teva Merlin
Javier Ortega-García
Dijana Petrovska-Delacrétaz
Douglas A. Reynolds
EURASIP Journal on Advances in Signal Processing, 2004
[13] Improving X-vector and PLDA for Text-dependent Speaker Verification
Chen, Zhuxin
Lin, Yue
INTERSPEECH 2020, 2020, : 726 - 730
[14] GENERATIVE X-VECTORS FOR TEXT-INDEPENDENT SPEAKER VERIFICATION
Xu, Longting
Das, Rohan Kumar
Yilmaz, Emre
Yang, Jichen
Li, Haizhou
2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 1014 - 1020
[15] Vector-Based Attentive Pooling for Text-Independent Speaker Verification
Wu, Yanfeng
Guo, Chenkai
Gao, Hongcan
Hou, Xiaolei
Xu, Jing
INTERSPEECH 2020, 2020, : 936 - 940
[16] Context-adaptive Gaussian Attention for Text-independent Speaker Verification
Peng, Junyi
Gu, Rongzhi
Zhang, Haoran
Zou, Yuexian
2020 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2020, : 595 - 599
[17] A novel text-independent speaker verification method based on the global speaker model
Zhang, YY
Zhang, D
Zhu, XY
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART A-SYSTEMS AND HUMANS, 2000, 30 (05): : 598 - 602
[18] Graphical models for text-independent speaker verification
Sánchez-Soto, E
Sigelle, M
Chollet, G
NONLINEAR SPEECH MODELING AND APPLICATIONS, 2005, 3445 : 410 - 415
[19] Cross similarity measurement for speaker adaptive test normalization in text-independent speaker verification
ZHAO Jian
The Journal of China Universities of Posts and Telecommunications, 2008, (02) : 130 - 134
[20] Language dependency in text-independent speaker verification
Auckenthaler, R
Carey, MJ
Mason, JSD
2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING, 2001, : 441 - 444

← 1 2 3 4 5 →