Exemplar-based speech waveform generation

被引：0

作者：

Watts, Oliver ^{[1
]}

Valentini-Botinhao, Cassia ^{[1
]}

Espic, Felipe ^{[1
]}

King, Simon ^{[1
]}

机构：

[1] Univ Edinburgh, Ctr Speech Technol Res, Edinburgh, Midlothian, Scotland

来源：

19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES | 2018年

基金：

英国工程与自然科学研究理事会;

关键词：

speech synthesis; vocoder; unit selection;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper presents a simple but effective method for generating speech waveforms by selecting small units of stored speech to match a low-dimensional target representation. The method is designed as a drop-in replacement for the vocoder in a deep neural network-based text-to-speech system. Most previous work on hybrid unit selection waveform generation relies on phonetic annotation for determining unit boundaries, or for specifying target cost, or for candidate preselection. In contrast, our waveform generator requires no phonetic information, annotation, or alignment. Unit boundaries are determined by epochs, and spectral analysis provides representations which are compared directly with target features at runtime. As in unit selection, we minimise a combination of target cost and join cost, but find that greedy left-to-right nearest-neighbour search gives similar results to dynamic programming. The method is fast and can generate the waveform incrementally. We use publicly available data and provide a permissively-licensed open source toolkit for reproducing our results.

引用

页码：2022 / 2026

页数：5

共 50 条

[21] Exemplar-Based Face Parsing
Smith, Brandon M.
Zhang, Li
Brandt, Jonathan
Lin, Zhe
Yang, Jianchao
2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, : 3484 - 3491
[22] Integrated exemplar-based template matching and statistical modeling for continuous speech recognition
Xie Sun
Yunxin Zhao
EURASIP Journal on Audio, Speech, and Music Processing, 2014
[23] Exemplar-Based Sparse Representations for Detection of Parkinson's Disease From Speech
Reddy, Mittapalle Kiran
Alku, Paavo
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 1386 - 1396
[24] Understanding and Improving the Exemplar-based Generation for Open-domain Conversation
Han, Seungju
Kim, Beomsu
Seo, Seokjun
Erdenee, Enkhbayar
Chang, Buru
PROCEEDINGS OF THE 4TH WORKSHOP ON NLP FOR CONVERSATIONAL AI, 2022, : 218 - 230
[25] Sparse modeling of neural network posterior probabilities for exemplar-based speech recognition
Dighe, Pranay
Asaei, Afsaneh
Bourlard, Herve
SPEECH COMMUNICATION, 2016, 76 : 230 - 244
[26] Integrated exemplar-based template matching and statistical modeling for continuous speech recognition
Sun, Xie
Zhao, Yunxin
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2014,
[27] SUPERVISED SPEECH DEREVERBERATION IN NOISY ENVIRONMENTS USING EXEMPLAR-BASED SPARSE REPRESENTATIONS
Baby, Deepak
Van Hamme, Hugo
2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 156 - 160
[28] Variational Exemplar-Based Image Colorization
Bugeau, Aurelie
Vinh-Thong Ta
Papadakis, Nicolas
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2014, 23 (01) : 298 - 307
[29] Exemplar-based logo and trademark recognition
Farajzadeh, Nacer
MACHINE VISION AND APPLICATIONS, 2015, 26 (06) : 791 - 805
[30] Exemplar-based human contour tracking
Xiang, SM
Nie, FP
Zhang, CS
COMPUTER VISION - ACCV 2006, PT I, 2006, 3851 : 338 - 347

← 1 2 3 4 5 →