Review of F0 modelling and generation in HMM based speech synthesis

被引:0
|
作者
Yu, Kai [1 ]
机构
[1] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Shanghai 200030, Peoples R China
关键词
statistical speech synthesis; HMM based synthesis; F0; modelling;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Fundamental frequency, or F0, is a critical factor in synthesising speech which is both natural and expressive. In HMM based speech synthesis, the modelling and generation of F0 is one of the key difficult factors which differentiate synthesis from recognition. Firstly, this is because F0 values are normally considered as a discontinuous function of time, whose domain is partly continuous and partly discrete. This results in two issues to be addressed in F0 modelling and generation: voiced/unvoiced decision and F0 trajectory. Another important characteristics of F0 is that it is supra-segmental, which means F0 should be modelled beyond the traditional phoneme level. Thirdly, the purpose of F0 modelling is not only for general high quality synthetic speech, but also for effective delivery of expressiveness. This requires explicitly link F0 modelling to (para/non-) linguistic information so that the control of F0 is easy and feasible. This paper reviews the state-of-the-art frameworks to address these issues. Possible future research directions are also discussed.
引用
收藏
页码:599 / 604
页数:6
相关论文
共 50 条
  • [21] CROSS-STREAM DEPENDENCY MODELING USING CONTINUOUS F0 MODEL FOR HMM-BASED SPEECH SYNTHESIS
    Wang, Xin
    Ling, Zhen-Hua
    Dai, Li-Rong
    2012 8TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, 2012, : 84 - 87
  • [22] HMM-BASED SPEECH SYNTHESIS WITH UNSUPERVISED LABELING OF ACCENTUAL CONTEXT BASED ON F0 QUANTIZATION AND AVERAGE VOICE MODEL
    Nose, Takashi
    Ooki, Koujirou
    Kobayashi, Takao
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4622 - 4625
  • [23] F-0 contour generation and synthesis using Bengali Hmm-based speech synthesis system
    Mukherjee, Sankar
    Das Mandal, Shyamal Kumar
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2015, 18 (01) : 25 - 36
  • [24] F0 generation in a text-to-speech system using a database of natural F0 patterns
    da Silva, CH
    Nagle, EJ
    Runstein, F
    Violaro, F
    ITS '98 PROCEEDINGS - SBT/IEEE INTERNATIONAL TELECOMMUNICATIONS SYMPOSIUM, VOLS 1 AND 2, 1998, : 213 - 218
  • [25] Superpositional HMM-based intonation synthesis using a functional F0 model
    Ni, Jinfu
    Shiga, Yoshinori
    Hori, Chiori
    2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 270 - 274
  • [26] Superpositional HMM-Based Intonation Synthesis Using a Functional F0 Model
    Ni, Jinfu
    Shiga, Yoshinori
    Hori, Chiori
    JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2016, 82 (02): : 273 - 286
  • [27] Superpositional HMM-Based Intonation Synthesis Using a Functional F0 Model
    Jinfu Ni
    Yoshinori Shiga
    Chiori Hori
    Journal of Signal Processing Systems, 2016, 82 : 273 - 286
  • [28] IMPROVED MODELING FOR F0 GENERATION AND V/U DECISION IN HMM-BASED TTS
    Zhang, Qingqing
    Soong, Frank
    Qian, Yao
    Yan, Zhijie
    Pan, Jielin
    Yan, Yonghong
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4606 - 4609
  • [29] A Minimum V/U Error Approach to F0 Generation in HMM-based TTS
    Qian, Yao
    Soong, Frank
    Wang, Miaomiao
    Wu, Zhizheng
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 400 - 403
  • [30] F0 analysis for Japanese conversational speech synthesis
    Nakajima, Hideharu
    Sagisaka, Yoshinori
    2009 EIGHTH INTERNATIONAL SYMPOSIUM ON NATURAL LANGUAGE PROCESSING, PROCEEDINGS, 2009, : 137 - +