Robust model for speaker verification against session-dependent utterance variation

被引:0
|
作者
Matsui, T [1 ]
Aikawa, K
机构
[1] Inst Stat Math, Tokyo 1068569, Japan
[2] NTT Corp, NTT Commun Sci Labs, Tokyo 1008116, Japan
来源
关键词
speaker verification; speaker model; session dependent; utterance variation; handset dependent distortion;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper investigates a new method for creating robust speaker models to cope with inter-session variation of a speaker in a continuous HMM-based speaker verification system. The new method estimates session-independent parameters by decomposing inter-session variations into two distinct parts: session-dependent and -independent. The parameters of the speaker models are estimated using the speaker adaptive training algorithm in conjunction with the equalization of session-dependent variation. The resultant models capture the session-independent speaker characteristics more reliably than the conventional models and their discriminative power improves accordingly. Moreover we have made our models more invariant to handset variations in a public switched telephone network (PSTN) by focusing on session-dependent variation and handset-dependent distortion separately. Text-independent speech data recorded by 20 speakers in seven sessions over 16 months was used to evaluate the new approach. The proposed method reduces the error rate by 15% relatively. When compared with the popular cepstral mean normalization, the error rate is reduced by 24% relatively when the speaker models were recreated using speech data recorded in four or more sessions.
引用
收藏
页码:712 / 718
页数:7
相关论文
共 49 条
  • [21] Psychoacoustic Model Compensation with Robust Feature Set for Speaker Verification in Additive Noise
    Panda, Ashish
    2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 629 - 632
  • [22] Noise robust speaker verification using paralel model combination and local features
    Tüfekci, Z
    PROCEEDINGS OF THE IEEE 12TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, 2004, : 422 - 425
  • [23] Reflection of Conditional Independence Structure to Noise Variability for Noise Robust Text Dependent Speaker Verification
    Yoon, Sunghyun
    IEEE ACCESS, 2022, 10 : 113427 - 113435
  • [24] Acoustic Factor Analysis based Universal Background Model for Robust Speaker Verification in Noise
    Hasan, Taufiq
    Hansen, John H. L.
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 3126 - 3130
  • [25] A hybrid noise robust model for multireplay attack detection in Automatic speaker verification systems
    Dua, Mohit
    Sadhu, Ambika
    Jindal, Anisha
    Mehta, Raman
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2022, 74
  • [26] Speaker-Phrase-Specific Adaptation of PLDA Model for Improved Performance in Text-Dependent Speaker Verification
    Mohammad Azharuddin Laskar
    Chuya China Bhanja
    Rabul Hussain Laskar
    Circuits, Systems, and Signal Processing, 2021, 40 : 5127 - 5151
  • [27] Speaker-Phrase-Specific Adaptation of PLDA Model for Improved Performance in Text-Dependent Speaker Verification
    Laskar, Mohammad Azharuddin
    Bhanja, Chuya China
    Laskar, Rabul Hussain
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2021, 40 (10) : 5127 - 5151
  • [28] Analysis of Deep Generative Model Impact on Feature Extraction and Dimension Reduction for Short Utterance Text-Independent Speaker Verification
    Farhadipour, Aref
    Veisi, Hadi
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2024, 43 (7) : 4547 - 4564
  • [29] Does Session Variability Compensation in Speaker Recognition Model Intrinsic Variation Under Mismatched Conditions?
    Shriberg, Elizabeth
    Kajarekar, Sachin
    Scheffer, Nicolas
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1519 - 1522
  • [30] Noise robust speaker verification via the fusion of SNR-independent and SNR-dependent PLDA
    Pang, Xiaomin
    Mak, Man-Wai
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2015, 18 (04) : 633 - 648