Superpositional HMM-based intonation synthesis using a functional F0 model

被引：0

作者：

Ni, Jinfu ^{[1
]}

Shiga, Yoshinori ^{[1
]}

Hori, Chiori ^{[1
]}

机构：

[1] Natl Inst Informat & Commun Technol, Spoken Language Commun Lab, Univ Commun Res Inst, Kyoto, Japan

来源：

2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP) | 2014年

关键词：

Intonation synthesis; HMM-based speech synthesis; functional F0 model; making focal prominence; prosody; AUTOMATIC EXTRACTION; SPEECH;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper addresses intonation synthesis combining statistical and functional approach with manipulation of fundamental frequency (F-0) contours in HMM-based speech synthesis. An F-0 contour is represented as a sum of micro, accent, and register components at the logarithmic scale, which is rooted in the Fujisaki model. Separated context-dependent (CD) HMMs are trained for each type of components extracted from a speech corpus based on a functional F-0 model. At the phase of synthesis, CDHMM-generated micro, accent, and register components are superimposed to form F-0 contours for input text. Objective and subjective evaluations are carried out on a Japanese speech corpus. Compared with the conventional approach, this method not only demonstrates the improved performance in naturalness of synthetic speech by achieving better global F-0 behaviors but also shows its flexibility for intonation manipulation through modifying the functional model parameters.

引用

页码：270 / 274

页数：5

共 50 条

[41] Development of an F0 control model based on F0 dynamic characteristics for singing-voice synthesis
Saitou, T
Unoki, M
Akagi, M
SPEECH COMMUNICATION, 2005, 46 (3-4) : 405 - 417
[42] HMM-based emotional speech synthesis using average emotion model
Qin, Long
Ling, Zhen-Hua
Wu, Yi-Jian
Zhang, Bu-Fan
Wang, Ren-Hua
CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2006, 4274 : 233 - +
[43] A trainable excitation model for HMM-based speech synthesis
Maia, R.
Toda, T.
Zen, H.
Nankaku, Y.
Tokuda, K.
INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1125 - +
[44] A Method for Automatically Estimating F0 Model Parameters and A Speech Re-Synthesis Tool Using F0 Model and STRAIGHT
Sato, Shota
Kimura, Taro
Horiuchi, Yasuo
Nishida, Masafumi
Kuroiwa, Shingo
Ichikawa, Akira
INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 545 - +
[45] HMM-Based Trust Model
Elsalamouny, Ehab
Sassone, Vladimiro
Nielsen, Mogens
FORMAL ASPECTS IN SECURITY AND TRUST, 2010, 5983 : 21 - +
[46] An HMM-Based Reputation Model
ElSalamouny, Ehab
Sassone, Vladimiro
ADVANCES IN SECURITY OF INFORMATION AND COMMUNICATION NETWORKS, 2013, 381 : 111 - +
[47] Intonation Control for Neural Text-to-Speech Synthesis with Polynomial Models of F0
Corkey, Niamh
O'Mahony, Johannah
King, Simon
INTERSPEECH 2023, 2023, : 2014 - 2015
[48] F0 in Lithuanian: The Indicator of Stress, Syllable Accent, or Intonation?
Kazlauskiene, Asta
Sabonyte, Regina
HUMAN LANGUAGE TECHNOLOGIES - THE BALTIC PERSPECTIVE, BALTIC HLT 2018, 2018, 307 : 55 - 62
[49] Interactive Intonation Optimisation Using CMA-ES and DCT Parameterisation of the F0 Contour for Speech Synthesis
Stan, Adriana
Pop, Florin-Claudiu
Cremene, Marcel
Giurgiu, Mircea
Pallez, Denis
NATURE INSPIRED COOPERATIVE STRATEGIES FOR OPTIMIZATION (NICSO 2011), 2011, 387 : 57 - +
[50] F0 declination of intonation groups in Spanish and in Mandarin Chinese
Yao, Junming
SPANISH IN CONTEXT, 2019, 16 (03) : 523 - 542

← 1 2 3 4 5 →