Exploiting Prosody Hierarchy and Dynamic Features for Pitch Modeling and Generation in HMM-Based Speech Synthesis

被引:28
|
作者
Hsia, Chi-Chun [1 ]
Wu, Chung-Hsien [2 ]
Wu, Jung-Yun [2 ]
机构
[1] Ind Technol Res Inst S, ICT Enabled Healthcare Program, Tainan 709, Taiwan
[2] Natl Cheng Kung Univ, Dept Comp Sci & Informat Engn, Tainan 701, Taiwan
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2010年 / 18卷 / 08期
关键词
Dynamic features; hidden Markov model (HMM)-based speech synthesis; pitch modeling and generation; prosody hierarchy; INFORMATION; SELECTION; UNITS;
D O I
10.1109/TASL.2010.2040791
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper proposes a method for modeling and generating pitch in hidden Markov model (HMM)-based Mandarin speech synthesis by exploiting prosody hierarchy and dynamic pitch features. The prosodic structure of a sentence is represented by a prosody hierarchy, which is constructed from the predicted prosodic breaks using a supervised classification and regression tree (S-CART). The S-CART is trained by maximizing the proportional reduction of entropy to minimize the errors in the prediction of the prosodic breaks. The pitch contour of a speech sentence is estimated using the STRAIGHT algorithm and decomposed into the prosodic features (static features) at prosodic word, syllable, and frame layers, based on the predicted prosodic structure. Dynamic features at each layer are estimated to preserve the temporal correlation between adjacent units. A hierarchical prosody model is constructed using an unsupervised CART (U-CART) for generating pitch contour. Minimum description length (MDL) is adopted in U-CART training. Objective and subjective evaluations with statistical hypothesis testing were conducted, and the results compared to corresponding results for HMM-based pitch modeling. The comparison confirms the improved performance of the proposed method.
引用
收藏
页码:1994 / 2003
页数:10
相关论文
共 50 条
  • [41] Thousands of Voices for HMM-based Speech Synthesis
    Yamagishi, Junichi
    Usabaev, Bela
    King, Simon
    Watts, Oliver
    Dines, John
    Tian, Jilei
    Hu, Rile
    Guan, Yong
    Oura, Keiichiro
    Tokuda, Keiichi
    Karhila, Reima
    Kurimo, Mikko
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 416 - +
  • [42] The role of higher-level linguistic features in HMM-based speech synthesis
    Watts, Oliver
    Yamagishi, Junichi
    King, Simon
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 841 - 844
  • [43] Analysis of HMM-Based Lombard Speech Synthesis
    Raitio, Tuomo
    Suni, Antti
    Vainio, Martti
    Alku, Paavo
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2792 - +
  • [44] A Parameter Generation Algorithm Using Local Variance for HMM-Based Speech Synthesis
    Nose, Takashi
    Chunwijitra, Vataya
    Kobayashi, Takao
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2014, 8 (02) : 221 - 228
  • [45] Parameter Generation Considering LSP Ordering Property for HMM-Based Speech Synthesis
    Qian, Shijun
    Wang, Huanliang
    Pei, Wenjiang
    Wang, Kai
    IEEE SIGNAL PROCESSING LETTERS, 2012, 19 (08) : 467 - 470
  • [46] PARAMETER GENERATION ALGORITHM CONSIDERING MODULATION SPECTRUM FOR HMM-BASED SPEECH SYNTHESIS
    Takamichi, Shinnosuke
    Toda, Tomoki
    Black, Alan W.
    Nakamura, Satoshi
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4210 - 4214
  • [47] An improved minimum generation error based model adaptation for HMM-based speech synthesis
    Wu, Yi-Jian
    Qin, Long
    Tokuda, Keiichi
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1727 - +
  • [48] Asynchronous F0 and Spectrum Modeling for HMM-Based Speech Synthesis
    Wang, Cheng-Cheng
    Ling, Zhen-Hua
    Dai, Li-Rong
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 412 - 415
  • [49] Modeling and interpolation of Austrian German and Viennese dialect in HMM-based speech synthesis
    Pucher, Michael
    Schabus, Dietmar
    Yamagishi, Junichi
    Neubarth, Friedrich
    Strom, Volker
    SPEECH COMMUNICATION, 2010, 52 (02) : 164 - 179
  • [50] A Hierarchical F0 Modeling Method for HMM-based Speech Synthesis
    Lei, Ming
    Wu, Yi-Jian
    Soong, Frank K.
    Ling, Zhen-Hua
    Dai, Li-Rong
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2170 - +