UNSUPERVISED PROSODIC PHRASE BOUNDARY LABELING OF MANDARIN SPEECH SYNTHESIS DATABASE USING CONTEXT-DEPENDENT HMM

被引:0
|
作者
Yang, Chen-Yu [1 ]
Ling, Zhen-Hua [1 ]
Dai, Li-Rong [1 ]
机构
[1] Univ Sci & Technol China, Natl Engn Lab Speech & Language Informat Proc, Hefei 230026, Peoples R China
关键词
speech synthesis; phrase boundary; unsupervised labeling; context-dependent hidden Markov model; Viterbi decoding;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, an automatic and unsupervised method based on context-dependent hidden Markov model (CD-HMM) is proposed for labeling the phrase boundary positions of a Mandarin speech synthesis database. The initial phrase boundary labels are predicted by clustering the durations of the pauses between every two prosodic words in an unsupervised way. Then, the CD-HMMs for the spectrum, F0 and phone duration are estimated by a means similar to the HMM-based parametric speech synthesis using the initial phrase boundary labels. These labels are further updated by Viterbi decoding under the maximum likelihood criterion given the acoustic feature sequences and the trained CD-HMMs. The model training and Viterbi decoding procedures are conducted iteratively until convergence. Experimental results on a Mandarin speech synthesis database show that this method is able to label the phrase boundary positions much more accurately than the text-analysis-based method without requiring any manually labeled training data. The unit selection speech synthesis system constructed using the phrase boundary labels generated by our proposed method achieves similar performance to that using the manual labels.
引用
收藏
页码:6875 / 6879
页数:5
相关论文
共 50 条
  • [1] Unsupervised Prosodic Labeling of Speech Synthesis Databases Using Context-Dependent HMMs
    Yang, Chen-Yu
    Ling, Zhen-Hua
    Dai, Li-Rong
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2014, E97D (06): : 1449 - 1460
  • [2] Automatic Phrase Boundary Labeling of Speech Synthesis Database Using Context-Dependent HMMs and N-Gram Prior Distributions
    Chen, Qian
    Ling, Zhen-Hua
    Yang, Chen-Yu
    Dai, Li-Rong
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1581 - 1585
  • [3] Prosodic variation enhancement using unsupervised context labeling for HMM-based expressive speech synthesis
    Maeno, Yu
    Nose, Takashi
    Kobayashi, Takao
    Koriyama, Tomoki
    Ijima, Yusuke
    Nakajima, Hideharu
    Mizuno, Hideyuki
    Yoshioka, Osamu
    SPEECH COMMUNICATION, 2014, 57 : 144 - 154
  • [4] HMM-Based Emphatic Speech Synthesis Using Unsupervised Context Labeling
    Maeno, Yu
    Nose, Takashi
    Kobayashi, Takao
    Ijima, Yusuke
    Nakajima, Hideharu
    Mizuno, Hideyuki
    Yoshioka, Osamu
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 1860 - +
  • [5] HMM-Based Thai Speech Synthesis Using Unsupervised Stress Context Labeling
    Moungsri, Decha
    Koriyama, Tomoki
    Kobayashi, Takao
    2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2014,
  • [6] Context-Dependent Labels for an HMM-Based Speech Synthesis System for Malay
    Mustafa, Mumtaz B.
    Don, Zuraidah M.
    Knowles, Gerry
    2013 INTERNATIONAL CONFERENCE ORIENTAL COCOSDA HELD JOINTLY WITH 2013 CONFERENCE ON ASIAN SPOKEN LANGUAGE RESEARCH AND EVALUATION (O-COCOSDA/CASLRE), 2013,
  • [7] HMM-BASED EXPRESSIVE SPEECH SYNTHESIS BASED ON PHRASE-LEVEL F0 CONTEXT LABELING
    Maeno, Yu
    Nose, Takashi
    Kobayashi, Takao
    Koriyama, Tomoki
    Ijima, Yusuke
    Nakajima, Hideharu
    Mizuno, Hideyuki
    Yoshioka, Osamu
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7859 - 7863
  • [8] Context-dependent hybrid HME/HMM speech recognition using polyphone clustering decision trees
    Fritsch, J
    Finke, M
    Waibel, A
    1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 1759 - 1762
  • [9] Context-dependent additive log F0 model for HMM-based speech synthesis
    Zen, Heiga
    Braunschweiler, Norbert
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2039 - 2042
  • [10] Integration of context-dependent durational knowledge into HMM-based speech recognition
    Wang, X
    tenBosch, LFM
    Pols, LCW
    ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1073 - 1076