On the Utility of Syllable-Based Acoustic Models for Pronunciation Variation Modelling

被引:0
|
作者
Annika Hämäläinen
Lou Boves
Johan de Veth
Louis ten Bosch
机构
[1] Radboud University Nijmegen,Centre for Language and Speech Technology (CLST), Faculty of Arts
关键词
Acoustics; Speech Recognition; Substantial Effect; Recognition Performance; Considerable Improvement;
D O I
暂无
中图分类号
学科分类号
摘要
Recent research on the TIMIT corpus suggests that longer-length acoustic models are more appropriate for pronunciation variation modelling than the context-dependent phones that conventional automatic speech recognisers use. However, the impressive speech recognition results obtained with longer-length models on TIMIT remain to be reproduced on other corpora. To understand the conditions in which longer-length acoustic models result in considerable improvements in recognition performance, we carry out recognition experiments on both TIMIT and the Spoken Dutch Corpus and analyse the differences between the two sets of results. We establish that the details of the procedure used for initialising the longer-length models have a substantial effect on the speech recognition results. When initialised appropriately, longer-length acoustic models that borrow their topology from a sequence of triphones cannot capture the pronunciation variation phenomena that hinder recognition performance the most.
引用
收藏
相关论文
共 50 条
  • [1] On the Utility of Syllable-Based Acoustic Models for Pronunciation Variation Modelling
    Hamalainen, Annika
    Boves, Lou
    de Veth, Johan
    ten Bosch, Louis
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2007, 2007 (1)
  • [2] PHRASE RECOGNIZER USING SYLLABLE-BASED ACOUSTIC MEASUREMENTS
    JOHNSON, DH
    WEINSTEIN, CJ
    IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1978, 26 (05): : 409 - 418
  • [3] Syllable-Based Acoustic Modeling with CTC-SMBR-LSTM
    Qu, Zhongdi
    Haghani, Parisa
    Weinstein, Eugene
    Moreno, Pedro
    2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, : 173 - 177
  • [4] Pronunciation Modeling of Loanwords for Korean ASR Using Phonological Knowledge and Syllable-based Segmentation
    Ryu, Hyuksu
    Na, Minsu
    Chung, Minhwa
    2015 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2015, : 430 - 435
  • [5] Modelling pronunciation variation with single-path and multi-path syllable models: Issues to consider
    Hamalainen, Annika
    ten Bosch, Louis
    Boves, Lou
    SPEECH COMMUNICATION, 2009, 51 (02) : 130 - 150
  • [6] AUTOMATIC PROSODIC EVENTS DETECTION USING SYLLABLE-BASED ACOUSTIC AND SYNTACTIC FEATURES
    Jeon, Je Hun
    Liu, Yang
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4565 - 4568
  • [7] Thai syllable-based information extraction using hidden Markov models
    Narupiyakul, L
    Thomas, C
    Cercone, N
    Sirinaovakul, B
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2004, 2945 : 537 - 546
  • [8] Syllable-based Malay Word Stemmer
    Lee, JunChoi
    Othman, Rosita Mohamad
    Mohamad, Nurul Zawiyah
    2013 IEEE SYMPOSIUM ON COMPUTERS AND INFORMATICS (ISCI 2013), 2013,
  • [9] Syllable-based Compression for XML Documents
    Chernik, Katsiaryna
    Lansky, Jan
    Galambos, Leo
    DATESO 2006 - DATABASES, TEXTS, SPECIFICATIONS, OBJECTS: PROCEEDINGS OF THE 6TH ANNUAL INTERNATIONAL WORKSHOP, 2006, 176 : 21 - 31
  • [10] Syllable clustering and spectral discontinuity in syllable-based TTS systems
    Chen, FX
    2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING I, 2003, : 688 - 691