Speaker-specific retraining for enhanced compression of unit selection text-to-speech databases

被引:0
|
作者
Nurminen, Jani [1 ]
Silen, Hanna [1 ]
Gabbouj, Moncef [1 ]
机构
[1] Tampere Univ Technol, Dept Signal Proc, Tampere, Finland
关键词
speech synthesis; unit selection; database compression; LPC PARAMETERS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Unit selection based text-to-speech systems can generally obtain high speech quality provided that the database is large enough. In embedded applications, the related memory requirements may be excessive and often the database needs to be both pruned and compressed to fit it into the available memory space. In this paper, we study the topic of database compression. In particular, the focus is on speaker-specific optimization of the quantizers used in the database compression. First, we introduce the simple concept of dynamic quantizer structures, facilitating the use of speaker-specific optimizations by enabling convenient run-time updates. Second, we show that significant memory savings can be obtained through speaker-specific retraining while perfectly maintaining the quantization accuracy, even when the memory required for the additional codebook data is taken into account. Thus, the proposed approach can be considered effective in reducing the conventionally large footprint of unit selection based text-to-speech systems.
引用
收藏
页码:388 / 391
页数:4
相关论文
共 50 条
  • [31] A Unit Selection Text-to-Speech Synthesis System Optimized for Use with Screen Readers
    Chalamandaris, Aimilios
    Karabetsos, Sotiris
    Tsiakoulis, Pirros
    Raptis, Spyros
    IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2010, 56 (03) : 1890 - 1897
  • [32] A global, boundary-centric framework for unit selection text-to-speech synthesis
    Bellegarda, JR
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (03): : 990 - 997
  • [33] Multimodal Unit Selection for 2D Audiovisual Text-to-Speech Synthesis
    Mattheyses, Wesley
    Latacz, Lukas
    Verhelst, Werner
    Sahil, Hichen
    MACHINE LEARNING FOR MULTIMODAL INTERACTION, PROCEEDINGS, 2008, 5237 : 125 - 136
  • [34] Applying Scalable Phonetic Context Similarity in Unit Selection of Concatenative Text-to-Speech
    Zhang, Wei
    Cui, Xiaodong
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 154 - 157
  • [35] Scalable implementation of unit selection based text-to-speech system for embedded solutions
    Nukaga, Nobuo
    Kamoshida, Ryota
    Nagamatsu, Kenji
    Kitahara, Yoshinori
    2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13, 2006, : 849 - 852
  • [36] An efficient unit-selection method for concatenative Text-to-speech synthesis systems
    Gros, Jerneja Zganec
    Zganec, Mario
    Journal of Computing and Information Technology, 2008, 16 (01) : 69 - 78
  • [37] Unit-centric feature mapping for inventory pruning in unit selection text-to-speech synthesis
    Bellegarda, Jerome R.
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2008, 16 (01): : 74 - 82
  • [38] A Comparison of Speaker-based and Utterance-based Data Selection for Text-to-Speech Synthesis
    Lee, Kai-Zhan
    Cooper, Erica
    Hirschberg, Julia
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2873 - 2877
  • [39] On the Construction of Unit Databanks for Text-to-Speech Systems
    Latsch, Vagner L.
    Netto, Sergio L.
    PROCEEDINGS OF THE IEEE INTERNATIONAL TELECOMMUNICATIONS SYMPOSIUM, VOLS 1 AND 2, 2006, : 340 - 343
  • [40] Exploiting Emotion Information in Speaker Embeddings for Expressive Text-to-Speech
    Shaheen, Zein
    Sadekova, Tasnima
    Matveeva, Yulia
    Shirshova, Alexandra
    Kudinov, Mikhail
    INTERSPEECH 2023, 2023, : 2038 - 2042