Speaker-specific retraining for enhanced compression of unit selection text-to-speech databases

被引：0

作者：

Nurminen, Jani ^{[1
]}

Silen, Hanna ^{[1
]}

Gabbouj, Moncef ^{[1
]}

机构：

[1] Tampere Univ Technol, Dept Signal Proc, Tampere, Finland

来源：

14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5 | 2013年

关键词：

speech synthesis; unit selection; database compression; LPC PARAMETERS;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Unit selection based text-to-speech systems can generally obtain high speech quality provided that the database is large enough. In embedded applications, the related memory requirements may be excessive and often the database needs to be both pruned and compressed to fit it into the available memory space. In this paper, we study the topic of database compression. In particular, the focus is on speaker-specific optimization of the quantizers used in the database compression. First, we introduce the simple concept of dynamic quantizer structures, facilitating the use of speaker-specific optimizations by enabling convenient run-time updates. Second, we show that significant memory savings can be obtained through speaker-specific retraining while perfectly maintaining the quantization accuracy, even when the memory required for the additional codebook data is taken into account. Thus, the proposed approach can be considered effective in reducing the conventionally large footprint of unit selection based text-to-speech systems.

引用

页码：388 / 391

页数：4

共 50 条

[41] Govorec(Speaker) Slovenian text-to-speech system for telecommunication applications
Sef, T
Gams, M
2002 6TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS, VOLS I AND II, 2002, : 504 - 507
[42] Govorec (Speaker) - Slovenian text-to-speech synthesizer for various applications
Sef, T
Gams, M
6TH WORLD MULTICONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL III, PROCEEDINGS: IMAGE, ACOUSTIC, SPEECH AND SIGNAL PROCESSING I, 2002, : 270 - 275
[43] Automatic prosodic modeling for speaker and task adaptation in text-to-speech
LopezGonzalo, E
RodriguezGarcia, JM
HernandezGomez, L
Villar, JM
1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 927 - 930
[44] A Small Footprint Hybrid Statistical and Unit Selection Text-to-Speech Synthesis System for Turkish
Guner, Ekrem
Demiroglu, Cenk
COMPUTER AND INFORMATION SCIENCES II, 2012, : 85 - 91
[45] RECENT IMPROVEMENTS OF PROBABILITY BASED PROSODY MODELS FOR UNIT SELECTION IN CONCATENATIVE TEXT-TO-SPEECH
Zhang, Wei
Gu, Liang
Gao, Yuqing
2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 3777 - 3780
[46] Enhanced quality text-to-speech for restricted domains
不详
BELL LABS TECHNICAL JOURNAL, 1997, 2 (04) : 169 - 170
[47] Refining Unit Boundaries for Mandarin Text-to-Speech Database
Dong, Minghui
Cen, Ling
Chan, Paul
Li, Haizhou
2009 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING, 2009, : 245 - 248
[48] Training Multi-Speaker Neural Text-to-Speech Systems using Speaker-Imbalanced Speech Corpora
Luong, Hieu-Thi
Wang, Xin
Yamagishi, Junichi
Nishizawa, Nobuyuki
INTERSPEECH 2019, 2019, : 1303 - 1307
[49] Deep Voice 2: Multi-Speaker Neural Text-to-Speech
Arik, Sercan O.
Diamos, Gregory
Gibiansky, Andrew
Miller, John
Peng, Kainan
Ping, Wei
Raiman, Jonathan
Zhou, Yanqi
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
[50] Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis
Jia, Ye
Zhang, Yu
Weiss, Ron J.
Wang, Quan
Shen, Jonathan
Ren, Fei
Chen, Zhifeng
Nguyen, Patrick
Pang, Ruoming
Moreno, Ignacio Lopez
Wu, Yonghui
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31

← 1 2 3 4 5 →