A style control technique for HMM-based expressive speech synthesis

被引：90

作者：

Nose, Takashi ^{[1
]}

Yamagishi, Junichi ^{[1
]}

Masuko, Takashi ^{[1
]}

Kobayashi, Takao ^{[1
]}

机构：

[1] Tokyo Inst Technol, Interdisciplinary Grad Sch Sci & Engn, Yokohama, Kanagawa 2268502, Japan

来源：

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS | 2007年 / E90D卷 / 09期

关键词：

HMM-based speech synthesis; speaking style; emotional expression; style interpolation; hidden semi-Markov model (HSMM); multiple-regression HSMM (MRHSMM);

D O I：

10.1093/ietisy/e90-d.9.1406

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper describes a technique for controlling the degree of expressivity of a desired emotional expression and/or speaking style of synthesized speech in an HMM-based speech synthesis framework. With this technique, multiple emotional expressions and speaking styles of speech are modeled in a single model by using a multiple-regression hidden semi-Markov model (MRHSMM). A set of control parameters, called the style vector, is defined, and each speech synthesis unit is modeled by using the MRHSMM, in which mean parameters of the state output and duration distributions are expressed by multiple-regression of the style vector. In the synthesis stage, the mean parameters of the synthesis units are modified by transforming an arbitrarily given style vector that corresponds to a point in a low-dimensional space, called style space, each of whose coordinates represents a certain specific speaking style or emotion of speech. The results of subjective evaluation tests show that style and its intensity can be controlled by changing the style vector.

引用

页码：1406 / 1413

页数：8

共 50 条

[31] STATISTICAL MODIFICATION BASED POST-FILTERING TECHNIQUE FOR HMM-BASED SPEECH SYNTHESIS
Wen, Zhengqi
Tao, Jianhua
Che, Hao
2012 8TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, 2012, : 146 - 149
[32] HMM-BASED EXPRESSIVE SPEECH SYNTHESIS BASED ON PHRASE-LEVEL F0 CONTEXT LABELING
Maeno, Yu
Nose, Takashi
Kobayashi, Takao
Koriyama, Tomoki
Ijima, Yusuke
Nakajima, Hideharu
Mizuno, Hideyuki
Yoshioka, Osamu
2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7859 - 7863
[33] State duration modeling for HMM-based speech synthesis
Zen, Heiga
Masuko, Takashi
Tokuda, Keiichi
Yoshimura, Takayoshi
Kobayasih, Takao
Kitamura, Tadashi
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2007, E90D (03): : 692 - 693
[34] Analysis and HMM-based synthesis of hypo and hyperarticulated speech
Picart, Benjamin
Drugman, Thomas
Dutoit, Thierry
COMPUTER SPEECH AND LANGUAGE, 2014, 28 (02): : 687 - 707
[35] Optimal Number of States in HMM-Based Speech Synthesis
Hanzlicek, Zdenek
TEXT, SPEECH, AND DIALOGUE, TSD 2017, 2017, 10415 : 353 - 361
[36] Synthesis and evaluation of conversational characteristics in HMM-based speech synthesis
Andersson, Sebastian
Yamagishi, Junichi
Clark, Robert A. J.
SPEECH COMMUNICATION, 2012, 54 (02) : 175 - 188
[37] A trainable excitation model for HMM-based speech synthesis
Maia, R.
Toda, T.
Zen, H.
Nankaku, Y.
Tokuda, K.
INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1125 - +
[38] Speaker interpolation for HMM-based speech synthesis system
Yoshimura, Takayoshi, 2000, Acoustical Soc Jpn, Tokyo, Japan (21):
[39] Contextual Additive Structure for HMM-Based Speech Synthesis
Takaki, Shinji
Nankaku, Yoshihiko
Tokuda, Keiichi
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2014, 8 (02) : 229 - 238
[40] Parameterization of Vocal Fry in HMM-Based Speech Synthesis
Silen, Hanna
Helander, Elina
Nurminen, Jani
Gabbouj, Moncef
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1735 - +

← 1 2 3 4 5 →