An Adaptive X-vector Model for Text-independent Speaker Verification

被引:4
|
作者
Gu, Bin [1 ]
Guo, Wu [1 ]
Ding, Penguin [1 ]
Ling, Zhenhua [1 ]
Du, Jun [1 ]
机构
[1] Univ Sci & Technol China, Natl Engn Lab Speech & Language Informat Proc, Hefei, Peoples R China
来源
基金
中国国家自然科学基金;
关键词
Speaker verification; Adaptive convolution; Adaptive batch normalization; Attention mechanism;
D O I
10.21437/Interspeech.2020-1071
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
In this paper, adaptive mechanisms are applied in deep neural network (DNN) training for x-vector-based text-independent speaker verification. First, adaptive convolutional neural networks (ACNNs) are employed in frame-level embedding layers, where the parameters of the convolution filters are adjusted based on the input features. Compared with conventional CNNs, ACNNs have more flexibility in capturing speaker information. Moreover, we replace conventional batch normalization (BN) with adaptive batch normalization (ABN). By dynamically generating the scaling and shifting parameters in BN, ABN adapts models to the acoustic variability arising from various factors such as channel and environmental noises. Finally, we incorporate these two methods to further improve performance. Experiments are carried out on the speaker in the wild (SITW) and VOiCES databases. The results demonstrate that the proposed methods significantly outperform the original x-vector approach.
引用
收藏
页码:1506 / 1510
页数:5
相关论文
共 50 条
  • [21] Text-independent speaker verification in embedded environments
    Tydlitat, Borivoj
    Navratil, Jiri
    Pelecanos, Jason W.
    Ramaswamy, Ganesh N.
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 293 - +
  • [22] ORTHOGONAL TRAINING FOR TEXT-INDEPENDENT SPEAKER VERIFICATION
    Zhu, Yingke
    Mak, Brian
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6584 - 6588
  • [23] Deeply Fused Speaker Embeddings for Text-Independent Speaker Verification
    Bhattacharya, Gautam
    Alam, Jahangir
    Gupta, Vishwa
    Kenny, Patrick
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3588 - 3592
  • [24] Weighted I-Vector Based Text-Independent Speaker Verification System
    Mohammadi, Mohsen
    Mohammadi, Hamid Reza Sadegh
    2019 27TH IRANIAN CONFERENCE ON ELECTRICAL ENGINEERING (ICEE 2019), 2019, : 1647 - 1653
  • [25] Deep Speaker Feature Learning for Text-independent Speaker Verification
    Li, Lantian
    Chen, Yixiang
    Shi, Zing
    Tang, Zhiyuan
    Wang, Dong
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1542 - 1546
  • [26] Maximum Model Distance Discriminative Training for Text-Independent Speaker Verification
    Hong, Q. Y.
    Kwong, S.
    IECON 2004: 30TH ANNUAL CONFERENCE OF IEEE INDUSTRIAL ELECTRONICS SOCIETY, VOL 2, 2004, : 1769 - 1774
  • [27] A Survey on Text-Dependent and Text-Independent Speaker Verification
    Tu, Youzhi
    Lin, Weiwei
    Mak, Man-Wai
    IEEE ACCESS, 2022, 10 : 99038 - 99049
  • [28] Text-Independent Speaker Verification Using Variational Gaussian Mixture Model
    Moattar, Mohammad Hossein
    Homayounpour, Mohammad Mehdi
    ETRI JOURNAL, 2011, 33 (06) : 914 - 923
  • [29] FEATURE SELECTION USING ADAPTIVE LEARNING NETWORKS FOR TEXT-INDEPENDENT SPEAKER VERIFICATION
    CHEUNG, RS
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1978, 64 : S183 - S183
  • [30] Residual Factor Analysis for Text-independent Speaker Verification
    Zhu, Lei
    Zheng, Rong
    Xu, Bo
    PROCEEDINGS OF THE 2009 CHINESE CONFERENCE ON PATTERN RECOGNITION AND THE FIRST CJK JOINT WORKSHOP ON PATTERN RECOGNITION, VOLS 1 AND 2, 2009, : 964 - 968