Cross-domain Adaptation with Discrepancy Minimization for Text-independent Forensic Speaker Verification

被引:8
|
作者
Wang, Zhenyu [1 ]
Xia, Wei [1 ]
Hansen, John H. L. [1 ]
机构
[1] Univ Texas Dallas, Ctr Robust Speech Syst CRSS, Dallas, TX 75080 USA
来源
关键词
speaker verification; cross-domain adaptation; discrepancy loss; maximum mean discrepancy; forensics; distribution alignment; RECOGNITION;
D O I
10.21437/Interspeech.2020-2738
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Forensic audio analysis for speaker verification offers unique challenges due to location/scenario uncertainty and diversity mismatch between reference and naturalistic field recordings. The lack of real naturalistic forensic audio corpora with ground-truth speaker identity represents a major challenge in this field. It is also difficult to directly employ small-scale domain-specific data to train complex neural network architectures due to domain mismatch and loss in performance. Alternatively, cross-domain speaker verification for multiple acoustic environments is a challenging task which could advance research in audio forensics. In this study, we introduce a CRSS-Forensics audio dataset collected in multiple acoustic environments. We pre-train a CNN-based network using the VoxCeleb data, followed by an approach which fine-tunes part of the high-level network layers with clean speech from CRSS-Forensics. Based on this fine-tuned model, we align domain-specific distributions in the embedding space with the discrepancy loss and maximum mean discrepancy (MMD). This maintains effective performance on the clean set, while simultaneously generalizes the model to other acoustic domains. From the results, we demonstrate that diverse acoustic environments affect the speaker verification performance, and that our proposed approach of cross-domain adaptation can significantly improve the results in this scenario.
引用
收藏
页码:2257 / 2261
页数:5
相关论文
共 50 条
  • [41] Text-independent speaker verification:: State of the art and challenges
    Petrovska-Delacretaz, Dijana
    El Hannani, Asmaa
    Chollet, Gerard
    PROGRESS IN NONLINEAR SPEECH PROCESSING, 2007, 4391 : 135 - +
  • [42] Exploration of Local Variability in Text-Independent Speaker Verification
    Liping Chen
    Kong Aik Lee
    Bin Ma
    Wu Guo
    Haizhou Li
    Li-Rong Dai
    Journal of Signal Processing Systems, 2016, 82 : 217 - 228
  • [43] FACTORED COVARIANCE MODELING FOR TEXT-INDEPENDENT SPEAKER VERIFICATION
    Wang, Eryu
    Lee, Kong Aik
    Ma, Bin
    Li, Haizhou
    Guo, Wu
    Dai, Lirong
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 4856 - 4859
  • [44] Text-independent speaker verification using covariance modeling
    Zilca, RD
    IEEE SIGNAL PROCESSING LETTERS, 2001, 8 (04) : 97 - 99
  • [45] Text-independent speaker verification with dynamic trajectory model
    Xiang, B
    IEEE SIGNAL PROCESSING LETTERS, 2003, 10 (05) : 141 - 143
  • [46] Mixup Learning Strategies for Text-independent Speaker Verification
    Zhu, Yingke
    Ko, Tom
    Mak, Brian
    INTERSPEECH 2019, 2019, : 4345 - 4349
  • [47] A CORRECTIVE LEARNING APPROACH FOR TEXT-INDEPENDENT SPEAKER VERIFICATION
    Wen, Yandong
    Zhou, Tianyan
    Singh, Rita
    Raj, Bhiksha
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4894 - 4898
  • [48] Group-based speaker embeddings for text-independent speaker verification
    Jung, Youngmoon
    Eom, Youngsik
    Lee, Yeonghyeon
    Kim, Hoirin
    JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2021, 40 (05): : 496 - 502
  • [49] Self-Attentive Speaker Embeddings for Text-Independent Speaker Verification
    Zhu, Yingke
    Ko, Tom
    Snyder, David
    Mak, Brian
    Povey, Daniel
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3573 - 3577
  • [50] A joint factor analysis approach to progressive model adaptation in text-independent speaker verification
    Yin, Shou-Chun
    Rose, Richard
    Kenny, Patrick
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (07): : 1999 - 2010