Review of F0 modelling and generation in HMM based speech synthesis

被引：0

作者：

Yu, Kai ^{[1
]}

机构：

[1] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Shanghai 200030, Peoples R China

来源：

PROCEEDINGS OF 2012 IEEE 11TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP) VOLS 1-3 | 2012年

关键词：

statistical speech synthesis; HMM based synthesis; F0; modelling;

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Fundamental frequency, or F0, is a critical factor in synthesising speech which is both natural and expressive. In HMM based speech synthesis, the modelling and generation of F0 is one of the key difficult factors which differentiate synthesis from recognition. Firstly, this is because F0 values are normally considered as a discontinuous function of time, whose domain is partly continuous and partly discrete. This results in two issues to be addressed in F0 modelling and generation: voiced/unvoiced decision and F0 trajectory. Another important characteristics of F0 is that it is supra-segmental, which means F0 should be modelled beyond the traditional phoneme level. Thirdly, the purpose of F0 modelling is not only for general high quality synthetic speech, but also for effective delivery of expressiveness. This requires explicitly link F0 modelling to (para/non-) linguistic information so that the control of F0 is easy and feasible. This paper reviews the state-of-the-art frameworks to address these issues. Possible future research directions are also discussed.

引用

页码：599 / 604

页数：6

共 50 条

[21] CROSS-STREAM DEPENDENCY MODELING USING CONTINUOUS F0 MODEL FOR HMM-BASED SPEECH SYNTHESIS
Wang, Xin
Ling, Zhen-Hua
Dai, Li-Rong
2012 8TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, 2012, : 84 - 87
[22] HMM-BASED SPEECH SYNTHESIS WITH UNSUPERVISED LABELING OF ACCENTUAL CONTEXT BASED ON F0 QUANTIZATION AND AVERAGE VOICE MODEL
Nose, Takashi
Ooki, Koujirou
Kobayashi, Takao
2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4622 - 4625
[23] F-0 contour generation and synthesis using Bengali Hmm-based speech synthesis system
Mukherjee, Sankar
Das Mandal, Shyamal Kumar
INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2015, 18 (01) : 25 - 36
[24] F0 generation in a text-to-speech system using a database of natural F0 patterns
da Silva, CH
Nagle, EJ
Runstein, F
Violaro, F
ITS '98 PROCEEDINGS - SBT/IEEE INTERNATIONAL TELECOMMUNICATIONS SYMPOSIUM, VOLS 1 AND 2, 1998, : 213 - 218
[25] Superpositional HMM-based intonation synthesis using a functional F0 model
Ni, Jinfu
Shiga, Yoshinori
Hori, Chiori
2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 270 - 274
[26] Superpositional HMM-Based Intonation Synthesis Using a Functional F0 Model
Ni, Jinfu
Shiga, Yoshinori
Hori, Chiori
JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2016, 82 (02): : 273 - 286
[27] Superpositional HMM-Based Intonation Synthesis Using a Functional F0 Model
Jinfu Ni
Yoshinori Shiga
Chiori Hori
Journal of Signal Processing Systems, 2016, 82 : 273 - 286
[28] IMPROVED MODELING FOR F0 GENERATION AND V/U DECISION IN HMM-BASED TTS
Zhang, Qingqing
Soong, Frank
Qian, Yao
Yan, Zhijie
Pan, Jielin
Yan, Yonghong
2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4606 - 4609
[29] A Minimum V/U Error Approach to F0 Generation in HMM-based TTS
Qian, Yao
Soong, Frank
Wang, Miaomiao
Wu, Zhizheng
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 400 - 403
[30] F0 analysis for Japanese conversational speech synthesis
Nakajima, Hideharu
Sagisaka, Yoshinori
2009 EIGHTH INTERNATIONAL SYMPOSIUM ON NATURAL LANGUAGE PROCESSING, PROCEEDINGS, 2009, : 137 - +

← 1 2 3 4 5 →