Syllable HMM based Mandarin TTS and Comparison with Concatenative TTS

被引:0
|
作者
Shuang, Zhiwei [1 ,2 ]
Kang, Shiyin [3 ]
Shi, Qin [2 ]
Qin, Yong [2 ]
Cai, Lianhong [3 ]
机构
[1] Univ Sci & Technol China, Hefei, Anhui, Peoples R China
[2] IBM China Res Lab, Beijing, Peoples R China
[3] Tsinghua Univ, Dept Comp Sci, Beijing, Peoples R China
关键词
Mandarin; syllable; HMM; TTS; synthesis;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper introduces a Syllable HMM based Mandarin ITS system. 10-state left-to-right HMMs are used to model each syllable. We leverage the corpus and the front end of a concatenative TTS system to build the Syllable HMM based TTS system. Furthermore, we utilize the unique consonant/vowel structure of Mandarin syllable to improve the voiced/unvoiced decision of HMM states. Evaluation results show that the Syllable HMM based Mandarin TTS system with a 5.3MB's model size can achieve an overall quality close to a concatenative ITS system with 1GB' data size.
引用
收藏
页码:1755 / +
页数:2
相关论文
共 50 条
  • [1] A segmental speech coder based on a concatenative TTS
    Lee, KS
    Cox, RV
    SPEECH COMMUNICATION, 2002, 38 (1-2) : 89 - 100
  • [2] HMM based TTS for Mixed Language Text
    Shuang, Zhiwei
    Kang, Shiyin
    Qin, Yong
    Dai, Lirong
    Cai, Lianhong
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 618 - +
  • [3] HMM based duration control for singing TTS
    Khan, Najeeb Ullah
    Lee, Jung Chul
    Lecture Notes in Electrical Engineering, 2015, 373 : 137 - 143
  • [4] A comparison of pronunciation modeling approaches for HMM-TTS
    Webster, Gabriel
    Krstulovic, Sacha
    Knill, Kate
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2190 - 2193
  • [5] Syllable clustering and spectral discontinuity in syllable-based TTS systems
    Chen, FX
    2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING I, 2003, : 688 - 691
  • [6] A Comparison Between Allophone, Syllable, and Diphone Based TTS Systems for Kurdish Language
    Barkhoda, Wafa
    ZahirAzami, Bahram
    Bahrampour, Anvar
    Shahryari, Om-Kolsoom
    2009 IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY (ISSPIT 2009), 2009, : 557 - +
  • [7] A comparison between allophone, syllable, and diphone based TTS systems for Azerbaijan language
    Cybernetics Institute, Azerbaijan National Academy of Sciences, 9, F. Agayev str., AZ1141, Baku, Azerbaijan
    Mini EURO Conf. Continuous Optim. Inf.-Based Technol. Financ. Sect., MEC EurOPT, 1600, (300-305):
  • [8] Segment Specific Concatenation Cost for Syllable Based Bengali TTS
    Narendra, N. P.
    Rao, K. Sreenivasa
    CONTEMPORARY COMPUTING, 2011, 168 : 371 - 382
  • [9] Sinusoidal model parameterization for HMM-based TTS system
    Shechtman, Slava
    Sorin, Alex
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 805 - 808
  • [10] Measuring the gap between HMM-based ASR and TTS
    Dines, John
    Yamagishi, Junichi
    King, Simon
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1411 - +