Zero-Shot Voice Conditioning for Denoising Diffusion TTS Models

被引:2
|
作者
Levkovitch, Alon [1 ]
Nachmani, Eliya [1 ,2 ]
Wolf, Lior [1 ]
机构
[1] Tel Aviv Univ, Tel Aviv, Israel
[2] Facebook AI Res, Tel Aviv, Israel
来源
基金
欧洲研究理事会;
关键词
D O I
10.21437/Interspeech.2022-10045
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We present a novel way of conditioning a pretrained denoising diffusion speech model to produce speech in the voice of a novel person unseen during training. The method requires a short (similar to 3 seconds) sample from the target person, and generation is steered at inference time, without any training steps. At the heart of the method lies a sampling process that combines the estimation of the denoising model with a low-pass version of the new speaker's sample. The objective and subjective evaluations show that our sampling method can generate a voice similar to that of the target speaker in terms of frequency, with an accuracy comparable to state-of-the-art methods, and without training.
引用
收藏
页码:2983 / 2987
页数:5
相关论文
共 50 条
  • [21] Large Language Models are Zero-Shot Reasoners
    Kojima, Takeshi
    Gu, Shixiang Shane
    Reid, Machel
    Matsuo, Yutaka
    Iwasawa, Yusuke
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [22] Zero-Shot Low-Field MRI Enhancement via Denoising Diffusion Driven Neural Representation
    Lin, Xiyue
    Du, Chenhe
    Wu, Qing
    Tian, Xuanyu
    Yu, Jingyi
    Zhang, Yuyao
    Wei, Hongjiang
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT VII, 2024, 15007 : 775 - 785
  • [23] Language Models as Zero-Shot Trajectory Generators
    Kwon, Teyun
    Di Palo, Norman
    Johns, Edward
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (07): : 6728 - 6735
  • [24] Zero-Shot Low-Dose CT Image Denoising via Patch-Based Content-Guided Diffusion Models
    Su, Bo
    Hu, Xiangyun
    Zha, Yunfei
    Wu, Zijun
    Ma, Yuncheng
    Xu, Jiabo
    Zhang, Baochang
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2025, 74
  • [25] HierVST: Hierarchical Adaptive Zero-shot Voice Style Transfer
    Lee, Sang-Hoon
    Choi, Ha-Yeong
    Oh, Hyung-Seok
    Lee, Seong-Whan
    INTERSPEECH 2023, 2023, : 4439 - 4443
  • [26] Towards Improved Zero-shot Voice Conversion with Conditional DSVAE
    Lian, Jiachen
    Zhang, Chunlei
    Anumanchipalli, Gopala Krishna
    Yu, Dong
    INTERSPEECH 2022, 2022, : 2598 - 2602
  • [27] Zero-Shot Unseen Speaker Anonymization via Voice Conversion
    Chang, Hyung-Pil
    Yoo, In-Chul
    Jeong, Changhyeon
    Yook, Dongsuk
    IEEE ACCESS, 2022, 10 : 130190 - 130199
  • [28] Zero-Shot Medical Image Translation via Frequency-Guided Diffusion Models
    Li, Yunxiang
    Shao, Hua-Chieh
    Liang, Xiao
    Chen, Liyuan
    Li, Ruiqi
    Jiang, Steve
    Wang, Jing
    Zhang, You
    IEEE TRANSACTIONS ON MEDICAL IMAGING, 2024, 43 (03) : 980 - 993
  • [29] Your Diffusion Model is Secretly a Zero-Shot Classifier
    Li, Alexander C.
    Prabhudesai, Mihir
    Duggal, Shivam
    Brown, Ellis
    Pathak, Deepak
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 2206 - 2217
  • [30] A generative adversarial network with “zero-shot” learning for positron image denoising
    Mingwei Zhu
    Min Zhao
    Min Yao
    Ruipeng Guo
    Scientific Reports, 13