Exploiting Prosody Hierarchy and Dynamic Features for Pitch Modeling and Generation in HMM-Based Speech Synthesis

被引:28
|
作者
Hsia, Chi-Chun [1 ]
Wu, Chung-Hsien [2 ]
Wu, Jung-Yun [2 ]
机构
[1] Ind Technol Res Inst S, ICT Enabled Healthcare Program, Tainan 709, Taiwan
[2] Natl Cheng Kung Univ, Dept Comp Sci & Informat Engn, Tainan 701, Taiwan
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2010年 / 18卷 / 08期
关键词
Dynamic features; hidden Markov model (HMM)-based speech synthesis; pitch modeling and generation; prosody hierarchy; INFORMATION; SELECTION; UNITS;
D O I
10.1109/TASL.2010.2040791
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper proposes a method for modeling and generating pitch in hidden Markov model (HMM)-based Mandarin speech synthesis by exploiting prosody hierarchy and dynamic pitch features. The prosodic structure of a sentence is represented by a prosody hierarchy, which is constructed from the predicted prosodic breaks using a supervised classification and regression tree (S-CART). The S-CART is trained by maximizing the proportional reduction of entropy to minimize the errors in the prediction of the prosodic breaks. The pitch contour of a speech sentence is estimated using the STRAIGHT algorithm and decomposed into the prosodic features (static features) at prosodic word, syllable, and frame layers, based on the predicted prosodic structure. Dynamic features at each layer are estimated to preserve the temporal correlation between adjacent units. A hierarchical prosody model is constructed using an unsupervised CART (U-CART) for generating pitch contour. Minimum description length (MDL) is adopted in U-CART training. Objective and subjective evaluations with statistical hypothesis testing were conducted, and the results compared to corresponding results for HMM-based pitch modeling. The comparison confirms the improved performance of the proposed method.
引用
收藏
页码:1994 / 2003
页数:10
相关论文
共 50 条
  • [1] INCORPORATING DYNAMIC FEATURES INTO MINIMUM GENERATION ERROR TRAINING FOR HMM-BASED SPEECH SYNTHESIS
    Ninh, Duy Khanh
    Morise, Masanori
    Yamashita, Yoichi
    2012 8TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, 2012, : 55 - 59
  • [2] Speaker adaptation of pitch and spectrum for HMM-based speech synthesis
    Tamura, M., 1600, John Wiley and Sons Inc. (35):
  • [3] Mixing HMM-Based Spanish speech synthesis with a CBR for prosody estimation
    Gonzalvo, Xavi
    Iriondo, Ignasi
    Socoro, Joan Claudi
    Alias, Francesc
    Monzo, Carlos
    ADVANCES IN NONLINEAR SPEECH PROCESSING, 2007, 4885 : 78 - 85
  • [4] Speech parameter generation algorithms for HMM-based speech synthesis
    Tokuda, K
    Yoshimura, T
    Masuko, T
    Kobayashi, T
    Kitamura, T
    2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 1315 - 1318
  • [5] State duration modeling for HMM-based speech synthesis
    Zen, Heiga
    Masuko, Takashi
    Tokuda, Keiichi
    Yoshimura, Takayoshi
    Kobayasih, Takao
    Kitamura, Tadashi
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2007, E90D (03): : 692 - 693
  • [6] A SIMPLE AND EFFECTIVE PITCH RE-ESTIMATION METHOD FOR RICH PROSODY AND SPEAKING STYLES IN HMM-BASED SPEECH SYNTHESIS
    Lin, Cheng-Yuan
    Huang, Chien-Hung
    Kuo, Chih-Chung
    2012 8TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, 2012, : 286 - 290
  • [7] Adaptation of pitch and spectrum for HMM-based speech synthesis using MLLR
    Tamura, M
    Masuko, T
    Tokuda, K
    Kobayashi, T
    2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING, 2001, : 805 - 808
  • [8] Robust Pitch Extraction Method for the HMM-Based Speech Synthesis System
    Reddy, M. Kiran
    Rao, K. Sreenivasa
    IEEE SIGNAL PROCESSING LETTERS, 2017, 24 (08) : 1133 - 1137
  • [9] Integrating Articulatory Features Into HMM-Based Parametric Speech Synthesis
    Ling, Zhen-Hua
    Richmond, Korin
    Yamagishi, Junichi
    Wang, Ren-Hua
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (06): : 1171 - 1185
  • [10] Minimum generation error training for HMM-based speech synthesis
    Wu, Yi-Jian
    Wang, Ren-Hua
    2006 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-13, 2006, : 89 - 92