Unsupervised Adaptive Speaker Recognition by Coupling-Regularized Optimal Transport

被引:0
|
作者
Zhang, Ruiteng [1 ]
Wei, Jianguo [1 ,2 ]
Lu, Xugang [3 ]
Lu, Wenhuan [1 ]
Jin, Di [1 ]
Zhang, Lin [4 ]
Xu, Junhai [1 ]
机构
[1] Tianjin Univ, Coll Intelligence & Comp, Tianjin 300350, Peoples R China
[2] Qinghai Nationalities Univ, Comp Coll, Xining 810007, Peoples R China
[3] Natl Inst Informat & Commun Technol, Kyoto 6190289, Japan
[4] Brno Univ Technol, Brno 61266, Czech Republic
基金
中国国家自然科学基金;
关键词
Speaker recognition; unsupervised domain adaptation; optimal transport; coupling regularization; DOMAIN ADAPTATION; NEURAL-NETWORKS;
D O I
10.1109/TASLP.2024.3426934
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Cross-domain speaker recognition (SR) can be improved by unsupervised domain adaptation (UDA) algorithms. UDA algorithms often reduce domain mismatch at the cost of decreasing the discrimination of speaker features. In contrast, optimal transport (OT) has the potential to achieve domain alignment while preserving the speaker discrimination capability in UDA applications; however, naively applying OT to measure global probability distribution discrepancies between the source and target domains may induce negative transports where samples belonging to different speakers are coupled in transportation. These negative transports reduce the SR model's discriminative power, degrading the SR performance. This paper proposes a coupling-regularized optimal transport (CROT) algorithm for cross-domain SR to reduce the negative transport during UDA. In the proposed CROT, two consecutive processing modules regularize the coupling paths for the OT solution: a progressive inter-speaker constraint (PISC) module and a coupling-smoothed regularization (CSR) module. The PISC, designed as a pseudo-label memory bank with curriculum learning, is first applied to select valid samples to guarantee that coupling samples are from the same speaker. The CSR, designed to control the information entropy of the coupling paths further, reduces the effect of negative transport in UDA. To evaluate the effectiveness of the proposed algorithm, cross-domain SR experiments were conducted under different target domains, speaker encoders, corpora, and acoustic features. Experimental results showed that CROT achieved a 50% relative reduction in equal error rates compared to conventional OT-based UDAs, outperforming the state-of-the-art UDAs.
引用
收藏
页码:3603 / 3617
页数:15
相关论文
共 50 条
  • [21] Ensemble based speaker recognition using unsupervised data selection
    Huang, Chien-Lin
    Wang, Jia-Ching
    Ma, Bin
    APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING, 2016, 5
  • [22] Ensemble Classifiers Using Unsupervised Data Selection for Speaker Recognition
    Huang, Chien-Lin
    Hori, Chiori
    Kashioka, Hideki
    Ma, Bin
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 2665 - +
  • [23] A DISCRIMINATIVE UNSUPERVISED METHOD FOR SPEAKER RECOGNITION USING DEEP LEARNING
    Saleem, Muhammad Muneeb
    Hansen, John H. L.
    2016 IEEE 26TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2016,
  • [24] Extended Variability Modeling and Unsupervised Adaptation for PLDA Speaker Recognition
    McCree, Alan
    Sell, Gregory
    Garcia-Romero, Daniel
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1552 - 1556
  • [25] Regularized Speaker Adaptation of KL-HMM for Dysarthric Speech Recognition
    Kim, Myungjong
    Kim, Younggwan
    Yoo, Joohong
    Wang, Jun
    Kim, Hoirin
    IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, 2017, 25 (09) : 1581 - 1591
  • [26] On Speaker-Independent, Speaker-Dependent, and Speaker-Adaptive Speech Recognition
    Huang, Xuedong
    Lee, Kai-Fu
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1993, 1 (02): : 150 - 157
  • [27] Two-Step Unsupervised Speaker Adaptation Based on Speaker and Gender Recognition and HMM Combination
    Cerva, Petr
    Nouza, Jan
    Silovsky, Jan
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2326 - 2329
  • [28] Optimal Transport for Unsupervised Denoising Learning
    Wang, Wei
    Wen, Fei
    Yan, Zeyu
    Liu, Peilin
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (02) : 2104 - 2118
  • [29] A simulation study on optimal scores for speaker recognition
    Wang, Dong
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2020, 2020 (01)
  • [30] Fast Speaker Adaptive Training for Speech Recognition
    Povey, Daniel
    Kuo, Hong-Kwang J.
    Soltau, Hagen
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1245 - 1248