Applying generation process model constraint to fundamental frequency contours generated by hidden-Markov-model-based speech synthesis

被引：4

作者：

Matsuda, Tetsuya ^{[1
]}

Hirose, Keikichi ^{[1
]}

Minematsu, Nobuaki ^{[1
]}

机构：

[1] Univ Tokyo, Grad Sch Informat Sci & Technol, Bunkyo Ku, 7-3-1 Hongo, Tokyo 1138656, Japan

来源：

ACOUSTICAL SCIENCE AND TECHNOLOGY | 2012年 / 33卷 / 04期

关键词：

F-0; contour; Generation process model; HMM-based speech synthesis;

D O I：

10.1250/ast.33.221

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Speech synthesis based on hidden Markov models (HMMs) processes both segmental and prosodic features of speech together in a frame-by-frame manner. One benefit of this method is that time alignment of both features is kept automatically. However, when the training data are limited, frame-by-frame representation is not appropriate for prosodic features, which tightly related to speech units spreading a wide time span, such as words, phrases and so on. This causes an inherit problem in fundamental frequency (F-0) contour generation by HMM-based speech synthesis. A method is developed to modify F-0 contours in the framework of generation process model (henceforth, F-0 model) by referring to linguistic information of input text (word boundary and accent type). It takes F-0 variances obtained through HMM-based speech synthesis into account during the process. Through a listening experiment on synthetic speech, the method is proved to generate better quality as compared to the HMM-based speech synthesis on average. Since the F-0 model can clearly relate its commands and linguistic (and para-/non-linguistic) information, the method has an additional advantage; changing speech styles, and/or adding further information (such as emphasis) can be easily done through manipulating the commands.

引用

页码：221 / 228

页数：8

共 50 条

[1] REPRESENTING FUNDAMENTAL FREQUENCY CONTOURS GENERATED BY HMM-BASED SPEECH SYNTHESIS USING GENERATION PROCESS MODEL
Hirose, Keikichi
Matsuda, Tatsuya
Hashimoto, Hiroya
Minematsu, Nobuaki
2011 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2011,
[2] Use of Generation Process Model for Synthesizing Fundamental Frequency Contours in HMM-based Speech Synthesis
Hirose, Keikichi
Hashimoto, Hiroya
Ikeshima, Jun
Minematsu, Nobuaki
PROCEEDINGS OF 2012 IEEE 11TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP) VOLS 1-3, 2012, : 575 - +
[3] Control of Fundamental Frequency Contours Using the Generation Process Model in HMM-Based Speech Synthesis
Matsuda, Tetsuya
Hirose, Keikichi
Minematsu, Nobuaki
2010 IEEE 10TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS (ICSP2010), VOLS I-III, 2010, : 617 - 620
[4] Generation of Fundamental Frequency Contours for Thai Speech Synthesis using Tone Nucleus Model
Krityakien, Oraphan
Hirose, Keikichi
Minematsu, Nobuaki
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1036 - 1040
[5] Modelling and estimation of the fundamental frequency of speech using a hidden Markov model
Taylor, John H.
Milner, Ben
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1925 - 1929
[6] Hidden-Markov-model-based voice activity detector with high speech detection rate for speech enhancement
Veisi, H.
Sameti, H.
IET SIGNAL PROCESSING, 2012, 6 (01) : 54 - 63
[7] Modeling of Fundamental Frequency Contours for HMM-based Speech Synthesis Representation of fundamental frequency contours for statistical speech synthesis
Hirose, Keikichi
PROCEEDINGS OF 2016 IEEE 13TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP 2016), 2016, : 171 - 176
[8] Control of Prosodic Focus in Corpus-based Generation of Fundamental Frequency Contours Based on the Generation Process Model
Hirose, Keikichi
Ochi, Keiko
Minematsu, Nobuaki
2010 IEEE 10TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS (ICSP2010), VOLS I-III, 2010, : 629 - 632
[9] Improved Automatic Extraction of Generation Process Model Commands and Its use for Generating Fundamental Frequency Contours for Training HMM-based Speech Synthesis
Hashimoto, Hiroya
Hirose, Keikichi
Minematsu, Nobuaki
13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 458 - 461
[10] Automatic parameter extraction of fundamental frequency contours of speech based on a generative model
Fujisaki, H
Ohno, S
Tomita, O
ICSP '96 - 1996 3RD INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, PROCEEDINGS, VOLS I AND II, 1996, : 729 - 732

← 1 2 3 4 5 →