Applying generation process model constraint to fundamental frequency contours generated by hidden-Markov-model-based speech synthesis

被引:4
|
作者
Matsuda, Tetsuya [1 ]
Hirose, Keikichi [1 ]
Minematsu, Nobuaki [1 ]
机构
[1] Univ Tokyo, Grad Sch Informat Sci & Technol, Bunkyo Ku, 7-3-1 Hongo, Tokyo 1138656, Japan
关键词
F-0; contour; Generation process model; HMM-based speech synthesis;
D O I
10.1250/ast.33.221
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech synthesis based on hidden Markov models (HMMs) processes both segmental and prosodic features of speech together in a frame-by-frame manner. One benefit of this method is that time alignment of both features is kept automatically. However, when the training data are limited, frame-by-frame representation is not appropriate for prosodic features, which tightly related to speech units spreading a wide time span, such as words, phrases and so on. This causes an inherit problem in fundamental frequency (F-0) contour generation by HMM-based speech synthesis. A method is developed to modify F-0 contours in the framework of generation process model (henceforth, F-0 model) by referring to linguistic information of input text (word boundary and accent type). It takes F-0 variances obtained through HMM-based speech synthesis into account during the process. Through a listening experiment on synthetic speech, the method is proved to generate better quality as compared to the HMM-based speech synthesis on average. Since the F-0 model can clearly relate its commands and linguistic (and para-/non-linguistic) information, the method has an additional advantage; changing speech styles, and/or adding further information (such as emphasis) can be easily done through manipulating the commands.
引用
收藏
页码:221 / 228
页数:8
相关论文
共 50 条
  • [1] REPRESENTING FUNDAMENTAL FREQUENCY CONTOURS GENERATED BY HMM-BASED SPEECH SYNTHESIS USING GENERATION PROCESS MODEL
    Hirose, Keikichi
    Matsuda, Tatsuya
    Hashimoto, Hiroya
    Minematsu, Nobuaki
    2011 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2011,
  • [2] Use of Generation Process Model for Synthesizing Fundamental Frequency Contours in HMM-based Speech Synthesis
    Hirose, Keikichi
    Hashimoto, Hiroya
    Ikeshima, Jun
    Minematsu, Nobuaki
    PROCEEDINGS OF 2012 IEEE 11TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP) VOLS 1-3, 2012, : 575 - +
  • [3] Control of Fundamental Frequency Contours Using the Generation Process Model in HMM-Based Speech Synthesis
    Matsuda, Tetsuya
    Hirose, Keikichi
    Minematsu, Nobuaki
    2010 IEEE 10TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS (ICSP2010), VOLS I-III, 2010, : 617 - 620
  • [4] Generation of Fundamental Frequency Contours for Thai Speech Synthesis using Tone Nucleus Model
    Krityakien, Oraphan
    Hirose, Keikichi
    Minematsu, Nobuaki
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1036 - 1040
  • [5] Modelling and estimation of the fundamental frequency of speech using a hidden Markov model
    Taylor, John H.
    Milner, Ben
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1925 - 1929
  • [6] Hidden-Markov-model-based voice activity detector with high speech detection rate for speech enhancement
    Veisi, H.
    Sameti, H.
    IET SIGNAL PROCESSING, 2012, 6 (01) : 54 - 63
  • [7] Modeling of Fundamental Frequency Contours for HMM-based Speech Synthesis Representation of fundamental frequency contours for statistical speech synthesis
    Hirose, Keikichi
    PROCEEDINGS OF 2016 IEEE 13TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP 2016), 2016, : 171 - 176
  • [8] Control of Prosodic Focus in Corpus-based Generation of Fundamental Frequency Contours Based on the Generation Process Model
    Hirose, Keikichi
    Ochi, Keiko
    Minematsu, Nobuaki
    2010 IEEE 10TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS (ICSP2010), VOLS I-III, 2010, : 629 - 632
  • [9] Improved Automatic Extraction of Generation Process Model Commands and Its use for Generating Fundamental Frequency Contours for Training HMM-based Speech Synthesis
    Hashimoto, Hiroya
    Hirose, Keikichi
    Minematsu, Nobuaki
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 458 - 461
  • [10] Automatic parameter extraction of fundamental frequency contours of speech based on a generative model
    Fujisaki, H
    Ohno, S
    Tomita, O
    ICSP '96 - 1996 3RD INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, PROCEEDINGS, VOLS I AND II, 1996, : 729 - 732