UNSUPERVISED PROSODIC PHRASE BOUNDARY LABELING OF MANDARIN SPEECH SYNTHESIS DATABASE USING CONTEXT-DEPENDENT HMM

被引:0
|
作者
Yang, Chen-Yu [1 ]
Ling, Zhen-Hua [1 ]
Dai, Li-Rong [1 ]
机构
[1] Univ Sci & Technol China, Natl Engn Lab Speech & Language Informat Proc, Hefei 230026, Peoples R China
关键词
speech synthesis; phrase boundary; unsupervised labeling; context-dependent hidden Markov model; Viterbi decoding;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, an automatic and unsupervised method based on context-dependent hidden Markov model (CD-HMM) is proposed for labeling the phrase boundary positions of a Mandarin speech synthesis database. The initial phrase boundary labels are predicted by clustering the durations of the pauses between every two prosodic words in an unsupervised way. Then, the CD-HMMs for the spectrum, F0 and phone duration are estimated by a means similar to the HMM-based parametric speech synthesis using the initial phrase boundary labels. These labels are further updated by Viterbi decoding under the maximum likelihood criterion given the acoustic feature sequences and the trained CD-HMMs. The model training and Viterbi decoding procedures are conducted iteratively until convergence. Experimental results on a Mandarin speech synthesis database show that this method is able to label the phrase boundary positions much more accurately than the text-analysis-based method without requiring any manually labeled training data. The unit selection speech synthesis system constructed using the phrase boundary labels generated by our proposed method achieves similar performance to that using the manual labels.
引用
收藏
页码:6875 / 6879
页数:5
相关论文
共 50 条
  • [42] A New Method for F0 Tracking Errors Fix and Generation in HMM-based Mandarin Speech Synthesis using Generation Process Model
    Wang, Miaomiao
    Wen, Miaomiao
    Hirose, Keikichi
    Minematsu, Nobuaki
    2010 IEEE 10TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS (ICSP2010), VOLS I-III, 2010, : 609 - 612
  • [43] Transform Mapping Using Shared Decision Tree Context Clustering for HMM-Based Cross-Lingual Speech Synthesis
    Nagahama, Daiki
    Nose, Takashi
    Koriyama, Tomoki
    Kobayashi, Takao
    15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 770 - 774
  • [44] UNSUPERVISED CROSS-LINGUAL SPEAKER ADAPTATION FOR HMM-BASED SPEECH SYNTHESIS USING TWO-PASS DECISION TREE CONSTRUCTION
    Gibson, Matthew
    Hirsimaki, Teemu
    Karhila, Reima
    Kurimo, Mikko
    Byrne, William
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4642 - 4645
  • [45] Analysis of unsupervised cross-lingual speaker adaptation for HMM-based speech synthesis using KLD-based transform mapping
    Oura, Keiichiro
    Yamagishi, Junichi
    Wester, Mirjam
    King, Simon
    Tokuda, Keiichi
    SPEECH COMMUNICATION, 2012, 54 (06) : 703 - 714
  • [46] Visuo-Phonetic Decoding using Multi-Stream and Context-Dependent Models for an Ultrasound-based Silent Speech Interface
    Hueber, Thomas
    Benaroya, Elie-Laurent
    Chollet, Gerard
    Denby, Bruce
    Dreyfus, Gerard
    Stone, Maureen
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 628 - +
  • [47] Unsupervised Intralingual and Cross-Lingual Speaker Adaptation for HMM-Based Speech Synthesis Using Two-Pass Decision Tree Construction
    Gibson, Matthew
    Byrne, William
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (04): : 895 - 904
  • [48] Weighted finite-state transducer-based dysarthric speech recognition error correction using context-dependent pronunciation variation modelling
    Seong, Woo Kyeong
    Park, Ji Hun
    INTERNATIONAL JOURNAL OF ENGINEERING SYSTEMS MODELLING AND SIMULATION, 2014, 6 (1-2) : 4 - 11
  • [49] Improving F0 Prediction Using Bidirectional Associative Memories and Syllable-Level F0 Features for HMM-based Mandarin Speech Synthesis
    Gao, Li
    Ling, Zhen-Hua
    Chen, Ling-Hui
    Dai, Li-Rong
    2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 275 - 279
  • [50] TONAL CONTEXT LABELING USING QUANTIZED F0 SYMBOLS FOR IMPROVING TONE CORRECTNESS IN AVERAGE-VOICE-BASED SPEECH SYNTHESIS
    Chunwijitra, Vataya
    Nose, Takashi
    Kobayashi, Takao
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 4708 - 4711