Speaker-specific retraining for enhanced compression of unit selection text-to-speech databases

被引:0
|
作者
Nurminen, Jani [1 ]
Silen, Hanna [1 ]
Gabbouj, Moncef [1 ]
机构
[1] Tampere Univ Technol, Dept Signal Proc, Tampere, Finland
关键词
speech synthesis; unit selection; database compression; LPC PARAMETERS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Unit selection based text-to-speech systems can generally obtain high speech quality provided that the database is large enough. In embedded applications, the related memory requirements may be excessive and often the database needs to be both pruned and compressed to fit it into the available memory space. In this paper, we study the topic of database compression. In particular, the focus is on speaker-specific optimization of the quantizers used in the database compression. First, we introduce the simple concept of dynamic quantizer structures, facilitating the use of speaker-specific optimizations by enabling convenient run-time updates. Second, we show that significant memory savings can be obtained through speaker-specific retraining while perfectly maintaining the quantization accuracy, even when the memory required for the additional codebook data is taken into account. Thus, the proposed approach can be considered effective in reducing the conventionally large footprint of unit selection based text-to-speech systems.
引用
收藏
页码:388 / 391
页数:4
相关论文
共 50 条
  • [1] Syllable specific unit selection cost functions for text-to-speech synthesis
    Narendra, N.P.
    Sreenivasa Rao, K.
    ACM Transactions on Speech and Language Processing, 2012, 9 (03):
  • [2] Speaker-Specific Pronunciation for Speech Synthesis
    Latacz, Lukas
    Mattheyses, Wesley
    Verhelst, Werner
    TEXT, SPEECH, AND DIALOGUE, TSD 2013, 2013, 8082 : 501 - 508
  • [3] Efficient Unit-Selection in Text-to-Speech Synthesis
    Mihelic, Ales
    Gros, Jerneja Zganec
    TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2008, 5246 : 411 - 418
  • [4] Studies in massively speaker-specific speech recognition
    Shi, Y
    Chang, E
    2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 825 - 828
  • [5] Speaker-specific mapping for text-independent speaker recognition
    Misra, H
    Ikbal, S
    Yegnanarayana, B
    SPEECH COMMUNICATION, 2003, 39 (3-4) : 301 - 310
  • [6] Embedded Unit Selection Text-to-Speech Synthesis for Mobile Devices
    Karabetsos, Sotiris
    Tsiakoulis, Pirros
    Chalamandaris, Aimilios
    Raptis, Spyros
    IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2009, 55 (02) : 613 - 621
  • [7] Continuity Metric for Unit Selection based Text-to-Speech Synthesis
    Lakkavalli, Vikram Ramesh
    Arulmozhi, P.
    Ramakrishnan, A. G.
    2010 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS (SPCOM), 2010,
  • [8] An Overview of the ILSP Unit Selection Text-to-Speech Synthesis System
    Tsiakoulis, Pirros
    Karabetsos, Sotiris
    Chalamandaris, Aimilios
    Raptis, Spyros
    ARTIFICIAL INTELLIGENCE: METHODS AND APPLICATIONS, 2014, 8445 : 370 - 383
  • [9] Learning speaker-specific pronunciations of disordered speech
    Christensen, H.
    Green, P.
    Hain, T.
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1158 - 1162
  • [10] Globally optimal training of unit boundaries in unit selection text-to-speech synthesis
    Bellegarda, Jerome R.
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (03): : 957 - 965