Exploiting Prosody Hierarchy and Dynamic Features for Pitch Modeling and Generation in HMM-Based Speech Synthesis

被引:28
|
作者
Hsia, Chi-Chun [1 ]
Wu, Chung-Hsien [2 ]
Wu, Jung-Yun [2 ]
机构
[1] Ind Technol Res Inst S, ICT Enabled Healthcare Program, Tainan 709, Taiwan
[2] Natl Cheng Kung Univ, Dept Comp Sci & Informat Engn, Tainan 701, Taiwan
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2010年 / 18卷 / 08期
关键词
Dynamic features; hidden Markov model (HMM)-based speech synthesis; pitch modeling and generation; prosody hierarchy; INFORMATION; SELECTION; UNITS;
D O I
10.1109/TASL.2010.2040791
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper proposes a method for modeling and generating pitch in hidden Markov model (HMM)-based Mandarin speech synthesis by exploiting prosody hierarchy and dynamic pitch features. The prosodic structure of a sentence is represented by a prosody hierarchy, which is constructed from the predicted prosodic breaks using a supervised classification and regression tree (S-CART). The S-CART is trained by maximizing the proportional reduction of entropy to minimize the errors in the prediction of the prosodic breaks. The pitch contour of a speech sentence is estimated using the STRAIGHT algorithm and decomposed into the prosodic features (static features) at prosodic word, syllable, and frame layers, based on the predicted prosodic structure. Dynamic features at each layer are estimated to preserve the temporal correlation between adjacent units. A hierarchical prosody model is constructed using an unsupervised CART (U-CART) for generating pitch contour. Minimum description length (MDL) is adopted in U-CART training. Objective and subjective evaluations with statistical hypothesis testing were conducted, and the results compared to corresponding results for HMM-based pitch modeling. The comparison confirms the improved performance of the proposed method.
引用
收藏
页码:1994 / 2003
页数:10
相关论文
共 50 条
  • [31] A speech parameter generation algorithm using local variance for HMM-based speech synthesis
    Chunwijitra, Vataya
    Nose, Takashi
    Kobayashi, Takao
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1150 - 1153
  • [32] Integration of Spectral Feature Extraction and Modeling for HMM-Based Speech Synthesis
    Nakamura, Kazuhiro
    Hashimoto, Kei
    Nankaku, Yoshihiko
    Tokuda, Keiichi
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2014, E97D (06): : 1438 - 1448
  • [33] Excitation Modeling Method Based on Inverse Filtering for HMM-Based Speech Synthesis
    Reddy, M. Kiran
    Rao, K. Sreenivasa
    MACHINE INTELLIGENCE AND SIGNAL ANALYSIS, 2019, 748 : 85 - 91
  • [34] HMM-Based Speech Synthesis for the Greek Language
    Karabetsos, Sotiris
    Tsiakoulis, Pirros
    Chalamandaris, Aimilios
    Raptis, Spyros
    TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2008, 5246 : 349 - 356
  • [35] A BAYESIAN APPROACH TO HMM-BASED SPEECH SYNTHESIS
    Hashimoto, Kei
    Zen, Heiga
    Nankaku, Yoshihiko
    Masuko, Takashi
    Tokuda, Keiichi
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4029 - +
  • [36] An HMM-based Vietnamese Speech Synthesis System
    Vu, Thang Tat
    Luong, Mai Chi
    Nakamura, Satoshi
    ORIENTAL COCOSDA 2009 - INTERNATIONAL CONFERENCE ON SPEECH DATABASE AND ASSESSMENTS, 2009, : 116 - +
  • [37] An HMM-based Cantonese Speech Synthesis System
    Wang, Xin
    Wu, Zhiyong
    2012 IEEE GLOBAL HIGH TECH CONGRESS ON ELECTRONICS (GHTCE), 2012,
  • [38] Unsupervised adaptation for HMM-based speech synthesis
    King, Simon
    Tokuda, Keiichi
    Zen, Heiga
    Yamagishi, Junichi
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1869 - +
  • [39] Use of voicing features in HMM-based speech recognition
    Thomson, DL
    Chengalvarayan, R
    SPEECH COMMUNICATION, 2002, 37 (3-4) : 197 - 211
  • [40] Optimal Residual Frame Based Source Modeling for HMM-based Speech Synthesis
    Narendra, N. P.
    Rao, K. Sreenivasa
    2015 EIGHTH INTERNATIONAL CONFERENCE ON ADVANCES IN PATTERN RECOGNITION (ICAPR), 2015, : 99 - 103