A style control technique for HMM-based expressive speech synthesis

被引:90
|
作者
Nose, Takashi [1 ]
Yamagishi, Junichi [1 ]
Masuko, Takashi [1 ]
Kobayashi, Takao [1 ]
机构
[1] Tokyo Inst Technol, Interdisciplinary Grad Sch Sci & Engn, Yokohama, Kanagawa 2268502, Japan
关键词
HMM-based speech synthesis; speaking style; emotional expression; style interpolation; hidden semi-Markov model (HSMM); multiple-regression HSMM (MRHSMM);
D O I
10.1093/ietisy/e90-d.9.1406
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper describes a technique for controlling the degree of expressivity of a desired emotional expression and/or speaking style of synthesized speech in an HMM-based speech synthesis framework. With this technique, multiple emotional expressions and speaking styles of speech are modeled in a single model by using a multiple-regression hidden semi-Markov model (MRHSMM). A set of control parameters, called the style vector, is defined, and each speech synthesis unit is modeled by using the MRHSMM, in which mean parameters of the state output and duration distributions are expressed by multiple-regression of the style vector. In the synthesis stage, the mean parameters of the synthesis units are modified by transforming an arbitrarily given style vector that corresponds to a point in a low-dimensional space, called style space, each of whose coordinates represents a certain specific speaking style or emotion of speech. The results of subjective evaluation tests show that style and its intensity can be controlled by changing the style vector.
引用
收藏
页码:1406 / 1413
页数:8
相关论文
共 50 条
  • [41] Speaking style adaptation using context clustering decision tree for HMM-based speech synthesis
    Yamagishi, J
    Tachibana, M
    Masuko, T
    Kobayashi, T
    2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 5 - 8
  • [42] Articulatory Control of HMM-based Parametric Speech Synthesis Driven by Phonetic Knowledge
    Ling, Zhen-Hua
    Richmond, Korin
    Yamagishi, Junichi
    Wang, Ren-Hua
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 573 - +
  • [43] An HMM-based speech synthesis system applied to English
    Tokuda, K
    Zen, H
    Black, AW
    PROCEEDINGS OF THE 2002 IEEE WORKSHOP ON SPEECH SYNTHESIS, 2002, : 227 - 230
  • [44] The Design and Implementation of HMM-based Dai Speech Synthesis
    Wang, Zhan
    Yang, Jian
    Yang, Xin
    2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
  • [45] DIALOGUE CONTEXT SENSITIVE HMM-BASED SPEECH SYNTHESIS
    Tsiakoulis, Pirros
    Breslin, Catherine
    Gasic, Milica
    Henderson, Matthew
    Kim, Dongho
    Szummer, Martin
    Thomson, Blaise
    Young, Steve
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [46] Evaluation of the Slovenian HMM-based speech synthesis system
    Vesnicer, B
    Mihelic, F
    TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2004, 3206 : 513 - 520
  • [47] HMM-based Tibetan Lhasa Speech Synthesis System
    Wu Zhiqiang
    Yu Hongzhi
    Li Guanyu
    Wan Shuhui
    2013 3RD INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT), 2013, : 92 - 95
  • [48] Statistical model training technique based on speaker clustering approach for HMM-based speech synthesis
    Ijima, Yusuke
    Miyazaki, Noboru
    Mizuno, Hideyuki
    Sakauchi, Sumitaka
    SPEECH COMMUNICATION, 2015, 71 : 50 - 61
  • [49] Evaluation of speech unit modelling for HMM-based speech synthesis for Arabic
    Houidhek, Amal
    Colotte, Vincent
    Mnasri, Zied
    Jouvet, Denis
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2018, 21 (04) : 895 - 906
  • [50] SPEECH-LAUGHS: AN HMM-BASED APPROACH FOR AMUSED SPEECH SYNTHESIS
    El Haddad, Kevin
    Dupont, Stephane
    Urbain, Jerome
    Dutoit, Thierry
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4939 - 4943