Towards Unsupervised Training of Speaker Independent Acoustic Models

被引：0

作者：

Jansen, Aren ^{[1
]}

Church, Kenneth ^{[1
]}

机构：

[1] Johns Hopkins Univ, Human Language Technol Ctr Excellence, Baltimore, MD 21218 USA

来源：

12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5 | 2011年

关键词：

speaker independent acoustic models; unsupervised training; spectral clustering; SPEECH;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Can we automatically discover speaker independent phoneme-like subword units with zero resources in a surprise language? There have been a number of recent efforts to automatically discover repeated spoken terms without a recognizer. This paper investigates the feasibility of using these results as constraints for unsupervised acoustic model training. We start with a relatively small set of word types, as well as their locations in the speech. The training process assumes that repetitions of the same (unknown) word share the same (unknown) sequence of subword units. For each word type, we train a whole-word hidden Markov model with Gaussian mixture observation densities and collapse correlated states across the word types using spectral clustering. We find that the resulting state clusters align reasonably well along phonetic lines. In evaluating cross-speaker word similarity, the proposed techniques outperform both raw acoustic features and language-mismatched acoustic models.

引用

页码：1704 / 1707

页数：4

共 50 条

[21] Unsupervised training of acoustic models for large vocabulary continuous speech recoornition
Wessel, F
Ney, H
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2005, 13 (01): : 23 - 31
[22] Unsupervised acoustic model training
Lamel, L
Gauvain, JL
Adda, G
2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 877 - 880
[23] Unsupervised training of acoustic models for large vocabulary continuous speech recognition
Wessel, F
Ney, H
ASRU 2001: IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, CONFERENCE PROCEEDINGS, 2001, : 307 - 310
[24] Unsupervised speaker indexing using generic models
Kwon, S
Narayanan, S
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2005, 13 (05): : 1004 - 1013
[25] Towards domain independent speaker clustering
Moh, Y
Nguyen, P
Junqua, JC
2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL II, PROCEEDINGS: SPEECH II; INDUSTRY TECHNOLOGY TRACKS; DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS; NEURAL NETWORKS FOR SIGNAL PROCESSING, 2003, : 85 - 88
[26] Unsupervised NAP Training Data Design for Speaker Recognition
Sun, Hanwu
Ma, Bin
13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1098 - 1101
[27] Unsupervised Speaker Adaptation based on the Cosine Similarity for Text-Independent Speaker Verification
Shum, Stephen
Dehak, Najim
Dehak, Reda
Glass, James R.
ODYSSEY 2010: THE SPEAKER AND LANGUAGE RECOGNITION WORKSHOP, 2010, : 76 - 82
[28] ON COMBINING I-VECTORS AND DISCRIMINATIVE ADAPTATION METHODS FOR UNSUPERVISED SPEAKER NORMALIZATION IN DNN ACOUSTIC MODELS
Samarakoon, Lahiru
Sim, Khe Chai
2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5275 - 5279
[29] TRAINING OF SPEAKER-CLUSTERED ACOUSTIC MODELS FOR USE IN REAL-TIME RECOGNIZERS
Vanek, Jan
Psutka, Josef V.
Zelinka, Jan
Prazak, Ales
Psutka, Josef
SIGMAP 2009: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND MULTIMEDIA APPLICATIONS, 2009, : 131 - 135
[30] Robust bootstrapping algorithm of speaker models for on-line unsupervised speaker indexing
School of Computer Science, Northwestern Polytechnical University, Xi'an 710072, China
Ruan Jian Xue Bao, 2007, 3 (608-616):

← 1 2 3 4 5 →