Review of F0 modelling and generation in HMM based speech synthesis

被引:0
|
作者
Yu, Kai [1 ]
机构
[1] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Shanghai 200030, Peoples R China
关键词
statistical speech synthesis; HMM based synthesis; F0; modelling;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Fundamental frequency, or F0, is a critical factor in synthesising speech which is both natural and expressive. In HMM based speech synthesis, the modelling and generation of F0 is one of the key difficult factors which differentiate synthesis from recognition. Firstly, this is because F0 values are normally considered as a discontinuous function of time, whose domain is partly continuous and partly discrete. This results in two issues to be addressed in F0 modelling and generation: voiced/unvoiced decision and F0 trajectory. Another important characteristics of F0 is that it is supra-segmental, which means F0 should be modelled beyond the traditional phoneme level. Thirdly, the purpose of F0 modelling is not only for general high quality synthetic speech, but also for effective delivery of expressiveness. This requires explicitly link F0 modelling to (para/non-) linguistic information so that the control of F0 is easy and feasible. This paper reviews the state-of-the-art frameworks to address these issues. Possible future research directions are also discussed.
引用
收藏
页码:599 / 604
页数:6
相关论文
共 50 条
  • [31] Unsupervised HMM classification of F0 curves
    Lolive, Damien
    Barbot, Nelly
    Boeffard, Olivier
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2073 - 2076
  • [32] Enhanced F0 generation for GPR-based speech synthesis considering syllable-based prosodic features
    Moungsri, Decha
    Koriyama, Tomoki
    Kobayashi, Takao
    2017 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC 2017), 2017, : 1575 - 1578
  • [33] F0 prediction model of speech synthesis based on template and statistical method
    Tao, JH
    TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2004, 3206 : 497 - 504
  • [34] A STUDY OF F0 MODELLING AND GENERATION WITH LYRICS AND SHAPE CHARACTERIZATION FOR SINGING VOICE SYNTHESIS
    Lee, S. W.
    Dong, Minghui
    Li, Haizhou
    2012 8TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, 2012, : 150 - 154
  • [35] AN F0 MODELING TECHNIQUE BASED ON PROSODIC EVENTS FOR SPONTANEOUS SPEECH SYNTHESIS
    Koriyama, Tomoki
    Nose, Takashi
    Kobayashi, Takao
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4589 - 4592
  • [36] Communicative F0 generation based on impressions
    Shao, Lu
    Greenberg, Yoko
    Sagisaka, Yoshinori
    2014 5TH IEEE CONFERENCE ON COGNITIVE INFOCOMMUNICATIONS (COGINFOCOM), 2014, : 115 - 119
  • [37] SEQUENCE-TO-SEQUENCE MODELLING OF F0 FOR SPEECH EMOTION CONVERSION
    Robinson, Carl
    Obin, Nicolas
    Roebel, Axel
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6830 - 6834
  • [38] F0 estimation of noisy speech based on complex speech analysis
    Kinjo, Tatsuhiko
    Funaki, Keiichi
    2006 IEEE 12TH DIGITAL SIGNAL PROCESSING WORKSHOP & 4TH IEEE SIGNAL PROCESSING EDUCATION WORKSHOP, VOLS 1 AND 2, 2006, : 434 - 437
  • [39] Additive modeling of English F0 contour for speech synthesis
    Sakai, S
    2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 277 - 280
  • [40] Tonal Contour Generation for Isarn Speech Synthesis Using Deep Learning and Sampling-Based F0 Representation
    Janyoi, Pongsathon
    Seresangtakul, Pusadee
    APPLIED SCIENCES-BASEL, 2020, 10 (18):