Towards Unsupervised Training of Speaker Independent Acoustic Models

被引:0
|
作者
Jansen, Aren [1 ]
Church, Kenneth [1 ]
机构
[1] Johns Hopkins Univ, Human Language Technol Ctr Excellence, Baltimore, MD 21218 USA
关键词
speaker independent acoustic models; unsupervised training; spectral clustering; SPEECH;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Can we automatically discover speaker independent phoneme-like subword units with zero resources in a surprise language? There have been a number of recent efforts to automatically discover repeated spoken terms without a recognizer. This paper investigates the feasibility of using these results as constraints for unsupervised acoustic model training. We start with a relatively small set of word types, as well as their locations in the speech. The training process assumes that repetitions of the same (unknown) word share the same (unknown) sequence of subword units. For each word type, we train a whole-word hidden Markov model with Gaussian mixture observation densities and collapse correlated states across the word types using spectral clustering. We find that the resulting state clusters align reasonably well along phonetic lines. In evaluating cross-speaker word similarity, the proposed techniques outperform both raw acoustic features and language-mismatched acoustic models.
引用
收藏
页码:1704 / 1707
页数:4
相关论文
共 50 条
  • [21] Unsupervised training of acoustic models for large vocabulary continuous speech recoornition
    Wessel, F
    Ney, H
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2005, 13 (01): : 23 - 31
  • [22] Unsupervised acoustic model training
    Lamel, L
    Gauvain, JL
    Adda, G
    2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 877 - 880
  • [23] Unsupervised training of acoustic models for large vocabulary continuous speech recognition
    Wessel, F
    Ney, H
    ASRU 2001: IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, CONFERENCE PROCEEDINGS, 2001, : 307 - 310
  • [24] Unsupervised speaker indexing using generic models
    Kwon, S
    Narayanan, S
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2005, 13 (05): : 1004 - 1013
  • [25] Towards domain independent speaker clustering
    Moh, Y
    Nguyen, P
    Junqua, JC
    2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL II, PROCEEDINGS: SPEECH II; INDUSTRY TECHNOLOGY TRACKS; DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS; NEURAL NETWORKS FOR SIGNAL PROCESSING, 2003, : 85 - 88
  • [26] Unsupervised NAP Training Data Design for Speaker Recognition
    Sun, Hanwu
    Ma, Bin
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1098 - 1101
  • [27] Unsupervised Speaker Adaptation based on the Cosine Similarity for Text-Independent Speaker Verification
    Shum, Stephen
    Dehak, Najim
    Dehak, Reda
    Glass, James R.
    ODYSSEY 2010: THE SPEAKER AND LANGUAGE RECOGNITION WORKSHOP, 2010, : 76 - 82
  • [28] ON COMBINING I-VECTORS AND DISCRIMINATIVE ADAPTATION METHODS FOR UNSUPERVISED SPEAKER NORMALIZATION IN DNN ACOUSTIC MODELS
    Samarakoon, Lahiru
    Sim, Khe Chai
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5275 - 5279
  • [29] TRAINING OF SPEAKER-CLUSTERED ACOUSTIC MODELS FOR USE IN REAL-TIME RECOGNIZERS
    Vanek, Jan
    Psutka, Josef V.
    Zelinka, Jan
    Prazak, Ales
    Psutka, Josef
    SIGMAP 2009: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND MULTIMEDIA APPLICATIONS, 2009, : 131 - 135
  • [30] Robust bootstrapping algorithm of speaker models for on-line unsupervised speaker indexing
    School of Computer Science, Northwestern Polytechnical University, Xi'an 710072, China
    Ruan Jian Xue Bao, 2007, 3 (608-616):