Exploiting Prosody Hierarchy and Dynamic Features for Pitch Modeling and Generation in HMM-Based Speech Synthesis

被引:28
|
作者
Hsia, Chi-Chun [1 ]
Wu, Chung-Hsien [2 ]
Wu, Jung-Yun [2 ]
机构
[1] Ind Technol Res Inst S, ICT Enabled Healthcare Program, Tainan 709, Taiwan
[2] Natl Cheng Kung Univ, Dept Comp Sci & Informat Engn, Tainan 701, Taiwan
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2010年 / 18卷 / 08期
关键词
Dynamic features; hidden Markov model (HMM)-based speech synthesis; pitch modeling and generation; prosody hierarchy; INFORMATION; SELECTION; UNITS;
D O I
10.1109/TASL.2010.2040791
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper proposes a method for modeling and generating pitch in hidden Markov model (HMM)-based Mandarin speech synthesis by exploiting prosody hierarchy and dynamic pitch features. The prosodic structure of a sentence is represented by a prosody hierarchy, which is constructed from the predicted prosodic breaks using a supervised classification and regression tree (S-CART). The S-CART is trained by maximizing the proportional reduction of entropy to minimize the errors in the prediction of the prosodic breaks. The pitch contour of a speech sentence is estimated using the STRAIGHT algorithm and decomposed into the prosodic features (static features) at prosodic word, syllable, and frame layers, based on the predicted prosodic structure. Dynamic features at each layer are estimated to preserve the temporal correlation between adjacent units. A hierarchical prosody model is constructed using an unsupervised CART (U-CART) for generating pitch contour. Minimum description length (MDL) is adopted in U-CART training. Objective and subjective evaluations with statistical hypothesis testing were conducted, and the results compared to corresponding results for HMM-based pitch modeling. The comparison confirms the improved performance of the proposed method.
引用
收藏
页码:1994 / 2003
页数:10
相关论文
共 50 条
  • [21] Pitch-Scaled Spectrum Based Excitation Model for HMM-based Speech Synthesis
    Zhengqi Wen
    Jianhua Tao
    Shifeng Pan
    Yang Wang
    Journal of Signal Processing Systems, 2014, 74 : 423 - 435
  • [22] Arabic HMM-based Speech Synthesis
    Khalil, Krichi Mohamed
    Adnan, Cherif
    2013 INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING AND SOFTWARE APPLICATIONS (ICEESA), 2013, : 450 - 454
  • [23] HMM-Based Vietnamese Speech Synthesis
    Trinh, Son
    Hoang, Kiem
    INTERNATIONAL JOURNAL OF SOFTWARE INNOVATION, 2015, 3 (04) : 33 - 47
  • [24] Pitch dependent phone modelling for HMM-based speech recognition
    Singer, H.
    Sagayama, S.
    Journal of the Acoustical Society of Japan (E) (English translation of Nippon Onkyo Gakkaishi), 1994, 15 (02):
  • [25] Generation of creaky voice for improving the quality of HMM-based speech synthesis
    Narendra, N. P.
    Rao, K. Sreenivasa
    COMPUTER SPEECH AND LANGUAGE, 2017, 42 : 38 - 58
  • [26] FULL COVARIANCE STATE DURATION MODELING FOR HMM-BASED SPEECH SYNTHESIS
    Lu, Heng
    Wu, Yi-Jian
    Tokuda, Keiichi
    Dai, Li-Rong
    Wang, Ren-Hua
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4033 - +
  • [27] A speech parameter generation algorithm considering global variance for HMM-based speech synthesis
    Toda, Tomoki
    Tokuda, Keiichi
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2007, E90D (05): : 816 - 824
  • [28] SPEECH PARAMETER GENERATION CONSIDERING LSP ORDERING PROPERTY FOR HMM-BASED SPEECH SYNTHESIS
    Qian, Shijun
    Wang, Huanliang
    Pei, Wenjiang
    Zou, Ping
    Wang, Kai
    2012 PROCEEDINGS OF THE 20TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2012, : 330 - 334
  • [29] Excitation Modeling for HMM-based Speech Synthesis Based on Principal Component Analysis
    Narendra, N. P.
    Reddy, M. Kiran
    Rao, K. Sreenivasa
    2016 TWENTY SECOND NATIONAL CONFERENCE ON COMMUNICATION (NCC), 2016,
  • [30] CROSS-STREAM DEPENDENCY MODELING FOR HMM-BASED SPEECH SYNTHESIS
    Ling, Zhen-Hua
    Zhang, Wei
    Wang, Ren-Hua
    2008 6TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2008, : 5 - 8