UNSUPERVISED PROSODIC PHRASE BOUNDARY LABELING OF MANDARIN SPEECH SYNTHESIS DATABASE USING CONTEXT-DEPENDENT HMM

被引：0

作者：

Yang, Chen-Yu ^{[1
]}

Ling, Zhen-Hua ^{[1
]}

Dai, Li-Rong ^{[1
]}

机构：

[1] Univ Sci & Technol China, Natl Engn Lab Speech & Language Informat Proc, Hefei 230026, Peoples R China

来源：

2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2013年

关键词：

speech synthesis; phrase boundary; unsupervised labeling; context-dependent hidden Markov model; Viterbi decoding;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In this paper, an automatic and unsupervised method based on context-dependent hidden Markov model (CD-HMM) is proposed for labeling the phrase boundary positions of a Mandarin speech synthesis database. The initial phrase boundary labels are predicted by clustering the durations of the pauses between every two prosodic words in an unsupervised way. Then, the CD-HMMs for the spectrum, F0 and phone duration are estimated by a means similar to the HMM-based parametric speech synthesis using the initial phrase boundary labels. These labels are further updated by Viterbi decoding under the maximum likelihood criterion given the acoustic feature sequences and the trained CD-HMMs. The model training and Viterbi decoding procedures are conducted iteratively until convergence. Experimental results on a Mandarin speech synthesis database show that this method is able to label the phrase boundary positions much more accurately than the text-analysis-based method without requiring any manually labeled training data. The unit selection speech synthesis system constructed using the phrase boundary labels generated by our proposed method achieves similar performance to that using the manual labels.

引用

页码：6875 / 6879

页数：5

共 50 条

[1] Unsupervised Prosodic Labeling of Speech Synthesis Databases Using Context-Dependent HMMs
Yang, Chen-Yu
Ling, Zhen-Hua
Dai, Li-Rong
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2014, E97D (06): : 1449 - 1460
[2] Automatic Phrase Boundary Labeling of Speech Synthesis Database Using Context-Dependent HMMs and N-Gram Prior Distributions
Chen, Qian
Ling, Zhen-Hua
Yang, Chen-Yu
Dai, Li-Rong
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1581 - 1585
[3] Prosodic variation enhancement using unsupervised context labeling for HMM-based expressive speech synthesis
Maeno, Yu
Nose, Takashi
Kobayashi, Takao
Koriyama, Tomoki
Ijima, Yusuke
Nakajima, Hideharu
Mizuno, Hideyuki
Yoshioka, Osamu
SPEECH COMMUNICATION, 2014, 57 : 144 - 154
[4] HMM-Based Emphatic Speech Synthesis Using Unsupervised Context Labeling
Maeno, Yu
Nose, Takashi
Kobayashi, Takao
Ijima, Yusuke
Nakajima, Hideharu
Mizuno, Hideyuki
Yoshioka, Osamu
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 1860 - +
[5] HMM-Based Thai Speech Synthesis Using Unsupervised Stress Context Labeling
Moungsri, Decha
Koriyama, Tomoki
Kobayashi, Takao
2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2014,
[6] Context-Dependent Labels for an HMM-Based Speech Synthesis System for Malay
Mustafa, Mumtaz B.
Don, Zuraidah M.
Knowles, Gerry
2013 INTERNATIONAL CONFERENCE ORIENTAL COCOSDA HELD JOINTLY WITH 2013 CONFERENCE ON ASIAN SPOKEN LANGUAGE RESEARCH AND EVALUATION (O-COCOSDA/CASLRE), 2013,
[7] HMM-BASED EXPRESSIVE SPEECH SYNTHESIS BASED ON PHRASE-LEVEL F0 CONTEXT LABELING
Maeno, Yu
Nose, Takashi
Kobayashi, Takao
Koriyama, Tomoki
Ijima, Yusuke
Nakajima, Hideharu
Mizuno, Hideyuki
Yoshioka, Osamu
2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7859 - 7863
[8] Context-dependent hybrid HME/HMM speech recognition using polyphone clustering decision trees
Fritsch, J
Finke, M
Waibel, A
1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 1759 - 1762
[9] Context-dependent additive log F0 model for HMM-based speech synthesis
Zen, Heiga
Braunschweiler, Norbert
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2039 - 2042
[10] Integration of context-dependent durational knowledge into HMM-based speech recognition
Wang, X
tenBosch, LFM
Pols, LCW
ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1073 - 1076

← 1 2 3 4 5 →