Speaker-specific retraining for enhanced compression of unit selection text-to-speech databases

被引：0

作者：

Nurminen, Jani ^{[1
]}

Silen, Hanna ^{[1
]}

Gabbouj, Moncef ^{[1
]}

机构：

[1] Tampere Univ Technol, Dept Signal Proc, Tampere, Finland

来源：

14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5 | 2013年

关键词：

speech synthesis; unit selection; database compression; LPC PARAMETERS;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Unit selection based text-to-speech systems can generally obtain high speech quality provided that the database is large enough. In embedded applications, the related memory requirements may be excessive and often the database needs to be both pruned and compressed to fit it into the available memory space. In this paper, we study the topic of database compression. In particular, the focus is on speaker-specific optimization of the quantizers used in the database compression. First, we introduce the simple concept of dynamic quantizer structures, facilitating the use of speaker-specific optimizations by enabling convenient run-time updates. Second, we show that significant memory savings can be obtained through speaker-specific retraining while perfectly maintaining the quantization accuracy, even when the memory required for the additional codebook data is taken into account. Thus, the proposed approach can be considered effective in reducing the conventionally large footprint of unit selection based text-to-speech systems.

引用

页码：388 / 391

页数：4

共 50 条

[1] Syllable specific unit selection cost functions for text-to-speech synthesis
Narendra, N.P.
Sreenivasa Rao, K.
ACM Transactions on Speech and Language Processing, 2012, 9 (03):
[2] Speaker-Specific Pronunciation for Speech Synthesis
Latacz, Lukas
Mattheyses, Wesley
Verhelst, Werner
TEXT, SPEECH, AND DIALOGUE, TSD 2013, 2013, 8082 : 501 - 508
[3] Efficient Unit-Selection in Text-to-Speech Synthesis
Mihelic, Ales
Gros, Jerneja Zganec
TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2008, 5246 : 411 - 418
[4] Studies in massively speaker-specific speech recognition
Shi, Y
Chang, E
2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 825 - 828
[5] Speaker-specific mapping for text-independent speaker recognition
Misra, H
Ikbal, S
Yegnanarayana, B
SPEECH COMMUNICATION, 2003, 39 (3-4) : 301 - 310
[6] Embedded Unit Selection Text-to-Speech Synthesis for Mobile Devices
Karabetsos, Sotiris
Tsiakoulis, Pirros
Chalamandaris, Aimilios
Raptis, Spyros
IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2009, 55 (02) : 613 - 621
[7] Continuity Metric for Unit Selection based Text-to-Speech Synthesis
Lakkavalli, Vikram Ramesh
Arulmozhi, P.
Ramakrishnan, A. G.
2010 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS (SPCOM), 2010,
[8] An Overview of the ILSP Unit Selection Text-to-Speech Synthesis System
Tsiakoulis, Pirros
Karabetsos, Sotiris
Chalamandaris, Aimilios
Raptis, Spyros
ARTIFICIAL INTELLIGENCE: METHODS AND APPLICATIONS, 2014, 8445 : 370 - 383
[9] Learning speaker-specific pronunciations of disordered speech
Christensen, H.
Green, P.
Hain, T.
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1158 - 1162
[10] Globally optimal training of unit boundaries in unit selection text-to-speech synthesis
Bellegarda, Jerome R.
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (03): : 957 - 965

← 1 2 3 4 5 →