EMPLOYMENT OF SUBSPACE GAUSSIAN MIXTURE MODELS IN SPEAKER RECOGNITION

被引：0

作者：

Motlicek, Petr ^{[1
]}

Dey, Subhadeep ^{[1
,2
]}

Madikeri, Srikanth ^{[1
]}

Burget, Lukas ^{[3
]}

机构：

[1] Idiap Res Inst, Martigny, Switzerland

[2] Ecole Polytech Fed Lausanne, Lausanne, Switzerland

[3] Brno Univ Technol, Brno, Czech Republic

来源：

2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP) | 2015年

关键词：

speaker recognition; i-vectors; subspace Gaussian mixture models; automatic speech recognition;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper presents Subspace Gaussian Mixture Model (SGMM) approach employed as a probabilistic generative model to estimate speaker vector representations to be subsequently used in the speaker verification task. SGMMs have already been shown to significantly outperform traditional HMM/GMMs in Automatic Speech Recognition (ASR) applications. An extension to the basic SGMM framework allows to robustly estimate low-dimensional speaker vectors and exploit them for speaker adaptation. We propose a speaker verification framework based on low-dimensional speaker vectors estimated using SGMMs, trained in ASR manner using manual transcriptions. To test the robustness of the system, we evaluate the proposed approach with respect to the state-of-the-art i-vector extractor on the NIST SRE 2010 evaluation set and on four different length-utterance conditions: 3sec-10sec, 10 sec-30 sec, 30 sec-60 sec and full (untruncated) utterances. Experimental results reveal that while i-vector system performs better on truncated 3sec to 10sec and 10 sec to 30 sec utterances, noticeable improvements are observed with SGMMs especially on full length-utterance durations. Eventually, the proposed SGMM approach exhibits complementary properties and can thus be efficiently fused with i-vector based speaker verification system.

引用

页码：4445 / 4449

页数：5

共 50 条

[41] Cross-Lingual Subspace Gaussian Mixture Models for Low-Resource Speech Recognition
Lu, Liang
Ghoshal, Arnab
Renals, Steve
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (01) : 17 - 27
[42] ACCURATE SPEAKER RECOGNITION BASED ON ADAPTIVE GAUSSIAN MIXTURE MODEL
Wang Yunqi
Yu Yibiao
2014 12TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP), 2014, : 527 - 531
[43] Speaker recognition and speaker normalization by projection to speaker subspace
Ariki, Y
Tagashira, S
Nishijima, M
1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 319 - 322
[44] SPEAKER PHONE MODE CLASSIFICATION USING GAUSSIAN MIXTURE MODELS
Eghbal-zadeh, H.
Sobhan-manesh, F.
Sameti, H.
BabaAli, B.
SPA 2011: SIGNAL PROCESSING ALGORITHMS, ARCHITECTURES, ARRANGEMENTS, AND APPLICATIONS CONFERENCE PROCEEDINGS, 2011, : 112 - +
[45] Use of Gaussian Mixture Models in Macedonian Forensic Speaker Identification
Gerazov, Branislav
Pop-Dimitrijoska, Vesna
Ivanovski, Zoran
Apostolovska, Gordana
2012 20TH TELECOMMUNICATIONS FORUM (TELFOR), 2012, : 724 - 727
[46] Analysis of Different Subspace Mixture Models in Handwriting Recognition
Aradhya, V. N. Manjunath
Niranjan, S. K.
13TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR 2012), 2012, : 670 - 674
[47] ROBUST TEXT-INDEPENDENT SPEAKER IDENTIFICATION USING GAUSSIAN MIXTURE SPEAKER MODELS
REYNOLDS, DA
ROSE, RC
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1995, 3 (01): : 72 - 83
[48] Speaker recognition based on dynamic time warping and Gaussian mixture model
Zhang, Nannan
Yao, Yanru
PROCEEDINGS OF THE 39TH CHINESE CONTROL CONFERENCE, 2020, : 1174 - 1177
[49] Bayesian Speaker Recognition Using Gaussian Mixture Model and Laplace Approximation
Cheng, Shih-Sian
Chen, I-Fan
Wang, Hsin-Min
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2738 - +
[50] Gaussian mixture language models for speech recognition
Afify, Mohamed
Siohan, Olivier
Sarikaya, Ruhi
2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 29 - +

← 1 2 3 4 5 →