Towards Unsupervised Training of Speaker Independent Acoustic Models

被引:0
|
作者
Jansen, Aren [1 ]
Church, Kenneth [1 ]
机构
[1] Johns Hopkins Univ, Human Language Technol Ctr Excellence, Baltimore, MD 21218 USA
关键词
speaker independent acoustic models; unsupervised training; spectral clustering; SPEECH;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Can we automatically discover speaker independent phoneme-like subword units with zero resources in a surprise language? There have been a number of recent efforts to automatically discover repeated spoken terms without a recognizer. This paper investigates the feasibility of using these results as constraints for unsupervised acoustic model training. We start with a relatively small set of word types, as well as their locations in the speech. The training process assumes that repetitions of the same (unknown) word share the same (unknown) sequence of subword units. For each word type, we train a whole-word hidden Markov model with Gaussian mixture observation densities and collapse correlated states across the word types using spectral clustering. We find that the resulting state clusters align reasonably well along phonetic lines. In evaluating cross-speaker word similarity, the proposed techniques outperform both raw acoustic features and language-mismatched acoustic models.
引用
收藏
页码:1704 / 1707
页数:4
相关论文
共 50 条
  • [31] Towards a Speaker Independent Speech-BCI Using Speaker Adaptation
    Dash, Debadatta
    Wisler, Alan
    Ferrari, Paul
    Wang, Jun
    INTERSPEECH 2019, 2019, : 864 - 868
  • [32] Acoustic Model Training Using Pseudo-Speaker Features Generated by MLLR Transformations for Robust Speaker-Independent Speech Recognition
    Itoh, Arata
    Hara, Sunao
    Kitaoka, Norihide
    Takeda, Kazuya
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2012, E95D (10): : 2479 - 2485
  • [33] Unsupervised Discriminative Training of PLDA for Domain Adaptation in Speaker Verification
    Wang, Qiongqiong
    Koshinaka, Takafumi
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3727 - 3731
  • [34] Improved Unsupervised NAP Training Dataset Design for Speaker Recognition
    Sun, Hanwu
    Ma, Bin
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1990 - 1994
  • [35] TOWARDS MULTI-SPEAKER UNSUPERVISED SPEECH PATTERN DISCOVERY
    Zhang, Yaodong
    Glass, James R.
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4366 - 4369
  • [36] SPEAKER ADAPTATION BASED ON THE MULTILINEAR DECOMPOSITION OF TRAINING SPEAKER MODELS
    Jeong, Yongwon
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4870 - 4873
  • [37] Speaker Adaptive Training of Deep Neural Network Acoustic Models Using I-Vectors
    Miao, Yajie
    Zhang, Hao
    Metze, Florian
    IEEE Transactions on Audio, Speech and Language Processing, 2015, 23 (11): : 1938 - 1949
  • [38] Speaker Adaptive Training of Deep Neural Network Acoustic Models Using I-Vectors
    Miao, Yajie
    Zhang, Hao
    Metze, Florian
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (11) : 1938 - 1949
  • [39] Speaker adaptive training and mixup regularization for neural network acoustic models in automatic speech recognition
    Tomashenko, Natalia
    Khokhlov, Yuri
    Esteve, Yannick
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2414 - 2418
  • [40] Unsupervised Speaker Adaptation Using Speaker-Class Models for Lecture Speech Recognition
    Kosaka, Tetsuo
    Takeda, Yuui
    Ito, Takashi
    Kato, Masaharu
    Kohda, Masaki
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2010, E93D (09): : 2363 - 2369