A style control technique for HMM-based expressive speech synthesis

被引:90
|
作者
Nose, Takashi [1 ]
Yamagishi, Junichi [1 ]
Masuko, Takashi [1 ]
Kobayashi, Takao [1 ]
机构
[1] Tokyo Inst Technol, Interdisciplinary Grad Sch Sci & Engn, Yokohama, Kanagawa 2268502, Japan
关键词
HMM-based speech synthesis; speaking style; emotional expression; style interpolation; hidden semi-Markov model (HSMM); multiple-regression HSMM (MRHSMM);
D O I
10.1093/ietisy/e90-d.9.1406
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper describes a technique for controlling the degree of expressivity of a desired emotional expression and/or speaking style of synthesized speech in an HMM-based speech synthesis framework. With this technique, multiple emotional expressions and speaking styles of speech are modeled in a single model by using a multiple-regression hidden semi-Markov model (MRHSMM). A set of control parameters, called the style vector, is defined, and each speech synthesis unit is modeled by using the MRHSMM, in which mean parameters of the state output and duration distributions are expressed by multiple-regression of the style vector. In the synthesis stage, the mean parameters of the synthesis units are modified by transforming an arbitrarily given style vector that corresponds to a point in a low-dimensional space, called style space, each of whose coordinates represents a certain specific speaking style or emotion of speech. The results of subjective evaluation tests show that style and its intensity can be controlled by changing the style vector.
引用
收藏
页码:1406 / 1413
页数:8
相关论文
共 50 条
  • [21] An HMM-based Vietnamese Speech Synthesis System
    Vu, Thang Tat
    Luong, Mai Chi
    Nakamura, Satoshi
    ORIENTAL COCOSDA 2009 - INTERNATIONAL CONFERENCE ON SPEECH DATABASE AND ASSESSMENTS, 2009, : 116 - +
  • [22] An HMM-based Cantonese Speech Synthesis System
    Wang, Xin
    Wu, Zhiyong
    2012 IEEE GLOBAL HIGH TECH CONGRESS ON ELECTRONICS (GHTCE), 2012,
  • [23] Unsupervised adaptation for HMM-based speech synthesis
    King, Simon
    Tokuda, Keiichi
    Zen, Heiga
    Yamagishi, Junichi
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1869 - +
  • [24] Czech Expressive Speech Synthesis in Limited Domain Comparison of Unit Selection and HMM-Based Approaches
    Gruber, Martin
    Hanzlicek, Zdenek
    TEXT, SPEECH AND DIALOGUE, TSD 2012, 2012, 7499 : 656 - 664
  • [25] Thousands of Voices for HMM-based Speech Synthesis
    Yamagishi, Junichi
    Usabaev, Bela
    King, Simon
    Watts, Oliver
    Dines, John
    Tian, Jilei
    Hu, Rile
    Guan, Yong
    Oura, Keiichiro
    Tokuda, Keiichi
    Karhila, Reima
    Kurimo, Mikko
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 416 - +
  • [26] Discrete/Continuous Modelling of Speaking Style in HMM-based Speech Synthesis: Design and Evaluation
    Obin, Nicolas
    Lanchantin, Pierre
    Lacheret, Anne
    Rodet, Xavier
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2796 - +
  • [27] Prosodic variation enhancement using unsupervised context labeling for HMM-based expressive speech synthesis
    Maeno, Yu
    Nose, Takashi
    Kobayashi, Takao
    Koriyama, Tomoki
    Ijima, Yusuke
    Nakajima, Hideharu
    Mizuno, Hideyuki
    Yoshioka, Osamu
    SPEECH COMMUNICATION, 2014, 57 : 144 - 154
  • [28] Analysis of HMM-Based Lombard Speech Synthesis
    Raitio, Tuomo
    Suni, Antti
    Vainio, Martti
    Alku, Paavo
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2792 - +
  • [29] Vowel Creation by Articulatory Control in HMM-based Parametric Speech Synthesis
    Ling, Zhen-Hua
    Richmond, Korin
    Yamagishi, Junichi
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 990 - 993
  • [30] Speech parameter generation algorithms for HMM-based speech synthesis
    Tokuda, K
    Yoshimura, T
    Masuko, T
    Kobayashi, T
    Kitamura, T
    2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 1315 - 1318