Towards Unsupervised Training of Speaker Independent Acoustic Models

被引:0
|
作者
Jansen, Aren [1 ]
Church, Kenneth [1 ]
机构
[1] Johns Hopkins Univ, Human Language Technol Ctr Excellence, Baltimore, MD 21218 USA
关键词
speaker independent acoustic models; unsupervised training; spectral clustering; SPEECH;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Can we automatically discover speaker independent phoneme-like subword units with zero resources in a surprise language? There have been a number of recent efforts to automatically discover repeated spoken terms without a recognizer. This paper investigates the feasibility of using these results as constraints for unsupervised acoustic model training. We start with a relatively small set of word types, as well as their locations in the speech. The training process assumes that repetitions of the same (unknown) word share the same (unknown) sequence of subword units. For each word type, we train a whole-word hidden Markov model with Gaussian mixture observation densities and collapse correlated states across the word types using spectral clustering. We find that the resulting state clusters align reasonably well along phonetic lines. In evaluating cross-speaker word similarity, the proposed techniques outperform both raw acoustic features and language-mismatched acoustic models.
引用
收藏
页码:1704 / 1707
页数:4
相关论文
共 50 条
  • [1] Unsupervised speaker adaptation for speaker independent acoustic to articulatory speech inversion
    Sivaraman, Ganesh
    Mitra, Vikramjit
    Nam, Hosung
    Tiede, Mark
    Espy-Wilson, Carol
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2019, 146 (01): : 316 - 329
  • [2] Batch Normalization based Unsupervised Speaker Adaptation for Acoustic Models
    Yi, Jiangyan
    Tao, Jianhua
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 176 - 180
  • [3] A Speaker Adaptive DNN Training Approach for Speaker-independent Acoustic Inversion
    Badino, Leonardo
    Franceschi, Luca
    Arora, Raman
    Donini, Michele
    Pontil, Massimiliano
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 984 - 988
  • [4] Unsupervised versus Supervised Training of Acoustic Models
    Ma, Jeff
    Schwartz, Richard
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 2374 - 2377
  • [5] UNSUPERVISED SPEAKER ADAPTATION OF BATCH NORMALIZED ACOUSTIC MODELS FOR ROBUST ASR
    Wang, Zhong-Qiu
    Wang, DeLiang
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 4890 - 4894
  • [6] Reducing the Inter-speaker Variance of CNN Acoustic Models Using Unsupervised Adversarial Multi-task Training
    Toth, Aszlo
    Gosztolya, Gabor
    SPEECH AND COMPUTER, SPECOM 2019, 2019, 11658 : 481 - 490
  • [7] Towards an Accurate Speaker-Independent Holy Quran Acoustic Model
    El Amrani, Mohamed Yassine
    Rahman, M. M. Hafizur
    Wahiddin, Mohamed Ridza
    Shah, Asadullah
    2017 4TH IEEE INTERNATIONAL CONFERENCE ON ENGINEERING TECHNOLOGIES AND APPLIED SCIENCES (ICETAS), 2017,
  • [8] SPEAKER DIARIZATION WITH UNSUPERVISED TRAINING FRAMEWORKL
    Le Lan, Gael
    Meignier, Sylvain
    Charlet, Delphine
    Deleglise, Paul
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5560 - 5564
  • [9] Robust bootstrapping of speaker models for unsupervised speaker indexing
    Fu, ZhongHua
    MULTIMEDIA CONTENT ANALYSIS AND MINING, PROCEEDINGS, 2007, 4577 : 122 - +
  • [10] The Use of Sense in Unsupervised Training of Acoustic Models for ASR Systems
    Singh, Rita
    Lambert, Benjamin
    Raj, Bhiksha
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2938 - 2941