Exemplar-based speech waveform generation

被引：0

作者：

Watts, Oliver ^{[1
]}

Valentini-Botinhao, Cassia ^{[1
]}

Espic, Felipe ^{[1
]}

King, Simon ^{[1
]}

机构：

[1] Univ Edinburgh, Ctr Speech Technol Res, Edinburgh, Midlothian, Scotland

来源：

19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES | 2018年

基金：

英国工程与自然科学研究理事会;

关键词：

speech synthesis; vocoder; unit selection;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper presents a simple but effective method for generating speech waveforms by selecting small units of stored speech to match a low-dimensional target representation. The method is designed as a drop-in replacement for the vocoder in a deep neural network-based text-to-speech system. Most previous work on hybrid unit selection waveform generation relies on phonetic annotation for determining unit boundaries, or for specifying target cost, or for candidate preselection. In contrast, our waveform generator requires no phonetic information, annotation, or alignment. Unit boundaries are determined by epochs, and spectral analysis provides representations which are compared directly with target features at runtime. As in unit selection, we minimise a combination of target cost and join cost, but find that greedy left-to-right nearest-neighbour search gives similar results to dynamic programming. The method is fast and can generate the waveform incrementally. We use publicly available data and provide a permissively-licensed open source toolkit for reproducing our results.

引用

页码：2022 / 2026

页数：5

共 50 条

[1] Exemplar-Based Emotive Speech Synthesis
Wu, Xixin
Cao, Yuewen
Lu, Hui
Liu, Songxiang
Kang, Shiyin
Wu, Zhiyong
Liu, Xunying
Meng, Helen
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 874 - 886
[2] Exemplar-Based Processing for Speech Recognition
Sainath, Tara N.
Ramabhadran, Bhuvana
Nahamoo, David
Kanevsky, Dimitri
Van Compernolle, Dirk
Demuynck, Kris
Gemmeke, Jort Florent
Bellegarda, Jerome R.
Sundaram, Shiva
IEEE SIGNAL PROCESSING MAGAZINE, 2012, 29 (06) : 98 - 113
[3] Multi-level Exemplar-Based Duration Generation for Expressive Speech Synthesis
Abou-Zleikha, Mohamed
Szekely, Eva
Cahill, Peter
Carson-Berndsen, Julie
PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON SPEECH PROSODY, VOLS I AND II, 2012, : 59 - 62
[4] Text Generation with Exemplar-based Adaptive Decoding
Peng, Hao
Parikh, Ankur P.
Faruqui, Manaal
Dhingra, Bhuwan
Das, Dipanjan
2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 2555 - 2565
[5] COUPLED DICTIONARY TRAINING FOR EXEMPLAR-BASED SPEECH ENHANCEMENT
Baby, Deepak
Virtanen, Tuomas
Barker, Tom
Van Hamme, Hugo
2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[6] Exemplar-based Stylized Gesture Generation from Speech: An Entry to the GENEA Challenge 2022
Ghorbani, Saeed
Ferstl, Ylva
Carbonneau, Marc-Andre
PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, ICMI 2022, 2022, : 778 - 783
[7] Enhancing Exemplar-Based Posteriors for Speech Recognition Tasks
Sainath, Tara N.
Nahamoo, David
Kanevsky, Dimitri
Ramabhadran, Bhuvana
13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 2127 - 2130
[8] Coupled Dictionaries for Exemplar-Based Speech Enhancement and Automatic Speech Recognition
Baby, Deepak
Virtanen, Tuomas
Gemmeke, Jort F.
van Hamme, Hugo
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (11) : 1788 - 1799
[9] Retrieve and Refine: Exemplar-based Neural Comment Generation
Wei, Bolin
34TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING (ASE 2019), 2019, : 1250 - 1252
[10] ON EXEMPLAR-BASED EXEMPLAR REPRESENTATIONS - REPLY
NOSOFSKY, RM
JOURNAL OF EXPERIMENTAL PSYCHOLOGY-GENERAL, 1988, 117 (04) : 412 - 414

← 1 2 3 4 5 →