Zero-Shot Voice Conditioning for Denoising Diffusion TTS Models

被引：2

作者：

Levkovitch, Alon ^{[1
]}

Nachmani, Eliya ^{[1
,2
]}

Wolf, Lior ^{[1
]}

机构：

[1] Tel Aviv Univ, Tel Aviv, Israel

[2] Facebook AI Res, Tel Aviv, Israel

来源：

INTERSPEECH 2022 | 2022年

基金：

欧洲研究理事会;

关键词：

D O I：

10.21437/Interspeech.2022-10045

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

We present a novel way of conditioning a pretrained denoising diffusion speech model to produce speech in the voice of a novel person unseen during training. The method requires a short (similar to 3 seconds) sample from the target person, and generation is steered at inference time, without any training steps. At the heart of the method lies a sampling process that combines the estimation of the denoising model with a low-pass version of the new speaker's sample. The objective and subjective evaluations show that our sampling method can generate a voice similar to that of the target speaker in terms of frequency, with an accuracy comparable to state-of-the-art methods, and without training.

引用

页码：2983 / 2987

页数：5

共 50 条

[21] Large Language Models are Zero-Shot Reasoners
Kojima, Takeshi
Gu, Shixiang Shane
Reid, Machel
Matsuo, Yutaka
Iwasawa, Yusuke
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
[22] Zero-Shot Low-Field MRI Enhancement via Denoising Diffusion Driven Neural Representation
Lin, Xiyue
Du, Chenhe
Wu, Qing
Tian, Xuanyu
Yu, Jingyi
Zhang, Yuyao
Wei, Hongjiang
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT VII, 2024, 15007 : 775 - 785
[23] Language Models as Zero-Shot Trajectory Generators
Kwon, Teyun
Di Palo, Norman
Johns, Edward
IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (07): : 6728 - 6735
[24] Zero-Shot Low-Dose CT Image Denoising via Patch-Based Content-Guided Diffusion Models
Su, Bo
Hu, Xiangyun
Zha, Yunfei
Wu, Zijun
Ma, Yuncheng
Xu, Jiabo
Zhang, Baochang
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2025, 74
[25] HierVST: Hierarchical Adaptive Zero-shot Voice Style Transfer
Lee, Sang-Hoon
Choi, Ha-Yeong
Oh, Hyung-Seok
Lee, Seong-Whan
INTERSPEECH 2023, 2023, : 4439 - 4443
[26] Towards Improved Zero-shot Voice Conversion with Conditional DSVAE
Lian, Jiachen
Zhang, Chunlei
Anumanchipalli, Gopala Krishna
Yu, Dong
INTERSPEECH 2022, 2022, : 2598 - 2602
[27] Zero-Shot Unseen Speaker Anonymization via Voice Conversion
Chang, Hyung-Pil
Yoo, In-Chul
Jeong, Changhyeon
Yook, Dongsuk
IEEE ACCESS, 2022, 10 : 130190 - 130199
[28] Zero-Shot Medical Image Translation via Frequency-Guided Diffusion Models
Li, Yunxiang
Shao, Hua-Chieh
Liang, Xiao
Chen, Liyuan
Li, Ruiqi
Jiang, Steve
Wang, Jing
Zhang, You
IEEE TRANSACTIONS ON MEDICAL IMAGING, 2024, 43 (03) : 980 - 993
[29] Your Diffusion Model is Secretly a Zero-Shot Classifier
Li, Alexander C.
Prabhudesai, Mihir
Duggal, Shivam
Brown, Ellis
Pathak, Deepak
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 2206 - 2217
[30] A generative adversarial network with “zero-shot” learning for positron image denoising
Mingwei Zhu
Min Zhao
Min Yao
Ruipeng Guo
Scientific Reports, 13

← 1 2 3 4 5 →