Exemplar-based speech waveform generation

被引:0
|
作者
Watts, Oliver [1 ]
Valentini-Botinhao, Cassia [1 ]
Espic, Felipe [1 ]
King, Simon [1 ]
机构
[1] Univ Edinburgh, Ctr Speech Technol Res, Edinburgh, Midlothian, Scotland
基金
英国工程与自然科学研究理事会;
关键词
speech synthesis; vocoder; unit selection;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a simple but effective method for generating speech waveforms by selecting small units of stored speech to match a low-dimensional target representation. The method is designed as a drop-in replacement for the vocoder in a deep neural network-based text-to-speech system. Most previous work on hybrid unit selection waveform generation relies on phonetic annotation for determining unit boundaries, or for specifying target cost, or for candidate preselection. In contrast, our waveform generator requires no phonetic information, annotation, or alignment. Unit boundaries are determined by epochs, and spectral analysis provides representations which are compared directly with target features at runtime. As in unit selection, we minimise a combination of target cost and join cost, but find that greedy left-to-right nearest-neighbour search gives similar results to dynamic programming. The method is fast and can generate the waveform incrementally. We use publicly available data and provide a permissively-licensed open source toolkit for reproducing our results.
引用
收藏
页码:2022 / 2026
页数:5
相关论文
共 50 条
  • [21] Exemplar-Based Face Parsing
    Smith, Brandon M.
    Zhang, Li
    Brandt, Jonathan
    Lin, Zhe
    Yang, Jianchao
    2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, : 3484 - 3491
  • [22] Integrated exemplar-based template matching and statistical modeling for continuous speech recognition
    Xie Sun
    Yunxin Zhao
    EURASIP Journal on Audio, Speech, and Music Processing, 2014
  • [23] Exemplar-Based Sparse Representations for Detection of Parkinson's Disease From Speech
    Reddy, Mittapalle Kiran
    Alku, Paavo
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 1386 - 1396
  • [24] Understanding and Improving the Exemplar-based Generation for Open-domain Conversation
    Han, Seungju
    Kim, Beomsu
    Seo, Seokjun
    Erdenee, Enkhbayar
    Chang, Buru
    PROCEEDINGS OF THE 4TH WORKSHOP ON NLP FOR CONVERSATIONAL AI, 2022, : 218 - 230
  • [25] Sparse modeling of neural network posterior probabilities for exemplar-based speech recognition
    Dighe, Pranay
    Asaei, Afsaneh
    Bourlard, Herve
    SPEECH COMMUNICATION, 2016, 76 : 230 - 244
  • [26] Integrated exemplar-based template matching and statistical modeling for continuous speech recognition
    Sun, Xie
    Zhao, Yunxin
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2014,
  • [27] SUPERVISED SPEECH DEREVERBERATION IN NOISY ENVIRONMENTS USING EXEMPLAR-BASED SPARSE REPRESENTATIONS
    Baby, Deepak
    Van Hamme, Hugo
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 156 - 160
  • [28] Variational Exemplar-Based Image Colorization
    Bugeau, Aurelie
    Vinh-Thong Ta
    Papadakis, Nicolas
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2014, 23 (01) : 298 - 307
  • [29] Exemplar-based logo and trademark recognition
    Farajzadeh, Nacer
    MACHINE VISION AND APPLICATIONS, 2015, 26 (06) : 791 - 805
  • [30] Exemplar-based human contour tracking
    Xiang, SM
    Nie, FP
    Zhang, CS
    COMPUTER VISION - ACCV 2006, PT I, 2006, 3851 : 338 - 347