Superpositional HMM-based intonation synthesis using a functional F0 model

被引:0
|
作者
Ni, Jinfu [1 ]
Shiga, Yoshinori [1 ]
Hori, Chiori [1 ]
机构
[1] Natl Inst Informat & Commun Technol, Spoken Language Commun Lab, Univ Commun Res Inst, Kyoto, Japan
关键词
Intonation synthesis; HMM-based speech synthesis; functional F0 model; making focal prominence; prosody; AUTOMATIC EXTRACTION; SPEECH;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper addresses intonation synthesis combining statistical and functional approach with manipulation of fundamental frequency (F-0) contours in HMM-based speech synthesis. An F-0 contour is represented as a sum of micro, accent, and register components at the logarithmic scale, which is rooted in the Fujisaki model. Separated context-dependent (CD) HMMs are trained for each type of components extracted from a speech corpus based on a functional F-0 model. At the phase of synthesis, CDHMM-generated micro, accent, and register components are superimposed to form F-0 contours for input text. Objective and subjective evaluations are carried out on a Japanese speech corpus. Compared with the conventional approach, this method not only demonstrates the improved performance in naturalness of synthetic speech by achieving better global F-0 behaviors but also shows its flexibility for intonation manipulation through modifying the functional model parameters.
引用
收藏
页码:270 / 274
页数:5
相关论文
共 50 条
  • [21] Prosody modeling from tone to intonation in Chinese using a functional F0 model
    Ni, Jinfu
    Sakai, Shinsuke
    Shimizu, Tohru
    Nakamura, Satoshi
    PROCEEDINGS OF THE SECOND INTERNATIONAL SYMPOSIUM ON UNIVERSAL COMMUNICATION, 2008, : 397 - 404
  • [22] Data-driven generation of F0 contours using a superpositional model
    Sakurai, A
    Hirose, K
    Minematsu, N
    SPEECH COMMUNICATION, 2003, 40 (04) : 535 - 549
  • [23] IMPROVED MODELING FOR F0 GENERATION AND V/U DECISION IN HMM-BASED TTS
    Zhang, Qingqing
    Soong, Frank
    Qian, Yao
    Yan, Zhijie
    Pan, Jielin
    Yan, Yonghong
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4606 - 4609
  • [24] A Minimum V/U Error Approach to F0 Generation in HMM-based TTS
    Qian, Yao
    Soong, Frank
    Wang, Miaomiao
    Wu, Zhizheng
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 400 - 403
  • [25] A Targets-based Superpositional Model of Fundamental Frequency Contours Applied to HMM-based Speech Synthesis
    Ni, Jinfu
    Shiga, Yoshinori
    Hori, Chiori
    Kidawara, Yutaka
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1051 - 1055
  • [26] A method for generation of Mandarin F0 contours based on tone nucleus model and superpositional model
    Sun, Qinghua
    Hirose, Keikichi
    Minematsu, Nobuaki
    SPEECH COMMUNICATION, 2012, 54 (08) : 932 - 945
  • [27] A model for the f0 reset in corpus-based intonation approaches
    Campillo, Francisco
    van Santen, Jan
    Banga, Eduardo R.
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2362 - +
  • [28] Review of F0 modelling and generation in HMM based speech synthesis
    Yu, Kai
    PROCEEDINGS OF 2012 IEEE 11TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP) VOLS 1-3, 2012, : 599 - 604
  • [29] F-0 contour generation and synthesis using Bengali Hmm-based speech synthesis system
    Mukherjee, Sankar
    Das Mandal, Shyamal Kumar
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2015, 18 (01) : 25 - 36
  • [30] Analysis of Chinese Interrogative Intonation and its Synthesis in HMM-Based Synthesis System
    Wang, Yongxin
    Jia, Jia
    Cai, Lianhong
    2010 THE 3RD INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND INDUSTRIAL APPLICATION (PACIIA2010), VOL V, 2010, : 176 - 179