A style control technique for HMM-based expressive speech synthesis

被引:90
|
作者
Nose, Takashi [1 ]
Yamagishi, Junichi [1 ]
Masuko, Takashi [1 ]
Kobayashi, Takao [1 ]
机构
[1] Tokyo Inst Technol, Interdisciplinary Grad Sch Sci & Engn, Yokohama, Kanagawa 2268502, Japan
关键词
HMM-based speech synthesis; speaking style; emotional expression; style interpolation; hidden semi-Markov model (HSMM); multiple-regression HSMM (MRHSMM);
D O I
10.1093/ietisy/e90-d.9.1406
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper describes a technique for controlling the degree of expressivity of a desired emotional expression and/or speaking style of synthesized speech in an HMM-based speech synthesis framework. With this technique, multiple emotional expressions and speaking styles of speech are modeled in a single model by using a multiple-regression hidden semi-Markov model (MRHSMM). A set of control parameters, called the style vector, is defined, and each speech synthesis unit is modeled by using the MRHSMM, in which mean parameters of the state output and duration distributions are expressed by multiple-regression of the style vector. In the synthesis stage, the mean parameters of the synthesis units are modified by transforming an arbitrarily given style vector that corresponds to a point in a low-dimensional space, called style space, each of whose coordinates represents a certain specific speaking style or emotion of speech. The results of subjective evaluation tests show that style and its intensity can be controlled by changing the style vector.
引用
收藏
页码:1406 / 1413
页数:8
相关论文
共 50 条
  • [31] STATISTICAL MODIFICATION BASED POST-FILTERING TECHNIQUE FOR HMM-BASED SPEECH SYNTHESIS
    Wen, Zhengqi
    Tao, Jianhua
    Che, Hao
    2012 8TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, 2012, : 146 - 149
  • [32] HMM-BASED EXPRESSIVE SPEECH SYNTHESIS BASED ON PHRASE-LEVEL F0 CONTEXT LABELING
    Maeno, Yu
    Nose, Takashi
    Kobayashi, Takao
    Koriyama, Tomoki
    Ijima, Yusuke
    Nakajima, Hideharu
    Mizuno, Hideyuki
    Yoshioka, Osamu
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7859 - 7863
  • [33] State duration modeling for HMM-based speech synthesis
    Zen, Heiga
    Masuko, Takashi
    Tokuda, Keiichi
    Yoshimura, Takayoshi
    Kobayasih, Takao
    Kitamura, Tadashi
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2007, E90D (03): : 692 - 693
  • [34] Analysis and HMM-based synthesis of hypo and hyperarticulated speech
    Picart, Benjamin
    Drugman, Thomas
    Dutoit, Thierry
    COMPUTER SPEECH AND LANGUAGE, 2014, 28 (02): : 687 - 707
  • [35] Optimal Number of States in HMM-Based Speech Synthesis
    Hanzlicek, Zdenek
    TEXT, SPEECH, AND DIALOGUE, TSD 2017, 2017, 10415 : 353 - 361
  • [36] Synthesis and evaluation of conversational characteristics in HMM-based speech synthesis
    Andersson, Sebastian
    Yamagishi, Junichi
    Clark, Robert A. J.
    SPEECH COMMUNICATION, 2012, 54 (02) : 175 - 188
  • [37] A trainable excitation model for HMM-based speech synthesis
    Maia, R.
    Toda, T.
    Zen, H.
    Nankaku, Y.
    Tokuda, K.
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1125 - +
  • [38] Speaker interpolation for HMM-based speech synthesis system
    Yoshimura, Takayoshi, 2000, Acoustical Soc Jpn, Tokyo, Japan (21):
  • [39] Contextual Additive Structure for HMM-Based Speech Synthesis
    Takaki, Shinji
    Nankaku, Yoshihiko
    Tokuda, Keiichi
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2014, 8 (02) : 229 - 238
  • [40] Parameterization of Vocal Fry in HMM-Based Speech Synthesis
    Silen, Hanna
    Helander, Elina
    Nurminen, Jani
    Gabbouj, Moncef
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1735 - +