Speaker Adaptation using Relevance Vector Regression for HMM-based Expressive TTS

被引:0
|
作者
Hong, Doo Hwa [1 ]
Lee, Joun Yeop
Jang, Se Young
Kim, Nam Soo
机构
[1] Seoul Natl Univ, Dept Elect & Comp Engn, Seoul, South Korea
关键词
speech synthesis; speaker adaptation; MLLR; relevance vector regression; LIKELIHOOD LINEAR-REGRESSION; KERNEL REGRESSION;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The conventional maximum likelihood linear regression (MLLR)-based adaptation algorithm employed to acoustic hidden Markov models (HMMs) is too restricted in linear regression to represent the details of mapping charateristics. To overcome this problem, we propose the relevance vector regression (RVR)-based model parameter adaptation technique. In this framework, the conventional technique is extended to have much more basis functions. Also, the weights for conducting a transform matrix are obtained by sparse Bayesian learning, in which most of the weights become zero due to the definition of the prior with the precision hyper-parameters. Furthermore, by using the appropriate kernel functions, RVR can take both of the advantages of linear and nonlinear regression. In the experiments, the emotional speech database is used for adaptation to evaluate the proposed method compared with the conventional constrained MLLR. From the experimental results, we conclude that the RVR adaption method performs better than the conventional method.
引用
收藏
页码:1216 / 1220
页数:5
相关论文
共 50 条
  • [11] CROSS-LINGUAL SPEAKER ADAPTATION FOR HMM-BASED SPEECH SYNTHESIS
    Wu, Yi-Jian
    King, Simon
    Tokuda, Keiichi
    2008 6TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2008, : 9 - 12
  • [12] Formant-based Frequency Warping for Improving Speaker Adaptation in HMM TTS
    Zhuang, Xin
    Qian, Yao
    Soong, Frank
    Wu, Yijian
    Zhang, Bo
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 817 - +
  • [13] UNSUPERVISED CROSS-LINGUAL SPEAKER ADAPTATION FOR HMM-BASED SPEECH SYNTHESIS
    Oura, Keiichiro
    Tokuda, Keiichi
    Yamagishi, Junichi
    King, Simon
    Wester, Mirjam
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4594 - 4597
  • [14] Sinusoidal model parameterization for HMM-based TTS system
    Shechtman, Slava
    Sorin, Alex
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 805 - 808
  • [15] A Perceptual Study of Acceleration Parameters in HMM-based TTS
    Chen, Yi-Ning
    Yan, Zhi-Jie
    Soong, Frank K.
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 426 - +
  • [16] Analysis of Speaker Adaptation Algorithms for HMM-Based Speech Synthesis and a Constrained SMAPLR Adaptation Algorithm
    Yamagishi, Junichi
    Kobayashi, Takao
    Nakano, Yuji
    Ogata, Katsumi
    Isogai, Juri
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (01): : 66 - 83
  • [17] Measuring the gap between HMM-based ASR and TTS
    Dines, John
    Yamagishi, Junichi
    King, Simon
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1411 - +
  • [18] Measuring the Gap Between HMM-Based ASR and TTS
    Dines, John
    Yamagishi, Junichi
    King, Simon
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2010, 4 (06) : 1046 - 1058
  • [19] Cross-lingual Speaker Adaptation for HMM-based Speech Synthesis based on Perceptual Characteristics and Speaker Interpolation
    Oliveira, Viviane de Franca
    Shiota, Sayaka
    Nankaku, Yoshihiko
    Tokuda, Keiichi
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 982 - 985
  • [20] Speaker adaptation method for acoustic-to-articulatory inversion using an HMM-based speech production model
    Hiroya, Sadao
    Honda, Masaaki
    IEICE Transactions on Information and Systems, 2004, E87-D (05) : 1071 - 1078