A style control technique for HMM-based expressive speech synthesis

被引：90

作者：

Nose, Takashi ^{[1
]}

Yamagishi, Junichi ^{[1
]}

Masuko, Takashi ^{[1
]}

Kobayashi, Takao ^{[1
]}

机构：

[1] Tokyo Inst Technol, Interdisciplinary Grad Sch Sci & Engn, Yokohama, Kanagawa 2268502, Japan

来源：

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS | 2007年 / E90D卷 / 09期

关键词：

HMM-based speech synthesis; speaking style; emotional expression; style interpolation; hidden semi-Markov model (HSMM); multiple-regression HSMM (MRHSMM);

D O I：

10.1093/ietisy/e90-d.9.1406

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper describes a technique for controlling the degree of expressivity of a desired emotional expression and/or speaking style of synthesized speech in an HMM-based speech synthesis framework. With this technique, multiple emotional expressions and speaking styles of speech are modeled in a single model by using a multiple-regression hidden semi-Markov model (MRHSMM). A set of control parameters, called the style vector, is defined, and each speech synthesis unit is modeled by using the MRHSMM, in which mean parameters of the state output and duration distributions are expressed by multiple-regression of the style vector. In the synthesis stage, the mean parameters of the synthesis units are modified by transforming an arbitrarily given style vector that corresponds to a point in a low-dimensional space, called style space, each of whose coordinates represents a certain specific speaking style or emotion of speech. The results of subjective evaluation tests show that style and its intensity can be controlled by changing the style vector.

引用

页码：1406 / 1413

页数：8

共 50 条

[21] An HMM-based Vietnamese Speech Synthesis System
Vu, Thang Tat
Luong, Mai Chi
Nakamura, Satoshi
ORIENTAL COCOSDA 2009 - INTERNATIONAL CONFERENCE ON SPEECH DATABASE AND ASSESSMENTS, 2009, : 116 - +
[22] An HMM-based Cantonese Speech Synthesis System
Wang, Xin
Wu, Zhiyong
2012 IEEE GLOBAL HIGH TECH CONGRESS ON ELECTRONICS (GHTCE), 2012,
[23] Unsupervised adaptation for HMM-based speech synthesis
King, Simon
Tokuda, Keiichi
Zen, Heiga
Yamagishi, Junichi
INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1869 - +
[24] Czech Expressive Speech Synthesis in Limited Domain Comparison of Unit Selection and HMM-Based Approaches
Gruber, Martin
Hanzlicek, Zdenek
TEXT, SPEECH AND DIALOGUE, TSD 2012, 2012, 7499 : 656 - 664
[25] Thousands of Voices for HMM-based Speech Synthesis
Yamagishi, Junichi
Usabaev, Bela
King, Simon
Watts, Oliver
Dines, John
Tian, Jilei
Hu, Rile
Guan, Yong
Oura, Keiichiro
Tokuda, Keiichi
Karhila, Reima
Kurimo, Mikko
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 416 - +
[26] Discrete/Continuous Modelling of Speaking Style in HMM-based Speech Synthesis: Design and Evaluation
Obin, Nicolas
Lanchantin, Pierre
Lacheret, Anne
Rodet, Xavier
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2796 - +
[27] Prosodic variation enhancement using unsupervised context labeling for HMM-based expressive speech synthesis
Maeno, Yu
Nose, Takashi
Kobayashi, Takao
Koriyama, Tomoki
Ijima, Yusuke
Nakajima, Hideharu
Mizuno, Hideyuki
Yoshioka, Osamu
SPEECH COMMUNICATION, 2014, 57 : 144 - 154
[28] Analysis of HMM-Based Lombard Speech Synthesis
Raitio, Tuomo
Suni, Antti
Vainio, Martti
Alku, Paavo
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2792 - +
[29] Vowel Creation by Articulatory Control in HMM-based Parametric Speech Synthesis
Ling, Zhen-Hua
Richmond, Korin
Yamagishi, Junichi
13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 990 - 993
[30] Speech parameter generation algorithms for HMM-based speech synthesis
Tokuda, K
Yoshimura, T
Masuko, T
Kobayashi, T
Kitamura, T
2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 1315 - 1318

← 1 2 3 4 5 →