Conditional Score Guidance for Text-Driven Image-to-Image Translation

被引:0
|
作者
Lee, Hyunsoo [1 ]
Kang, Minsoo [1 ]
Han, Bohyung [1 ,2 ]
机构
[1] Seoul Natl Univ, ECE, Seoul, South Korea
[2] Seoul Natl Univ, IPAI, Seoul, South Korea
来源
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023) | 2023年
基金
新加坡国家研究基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a novel algorithm for text-driven image-to-image translation based on a pretrained text-to-image diffusion model. Our method aims to generate a target image by selectively editing regions of interest in a source image, defined by a modifying text, while preserving the remaining parts. In contrast to existing techniques that solely rely on a target prompt, we introduce a new score function that additionally considers both the source image and the source text prompt, tailored to address specific translation tasks. To this end, we derive the conditional score function in a principled way, decomposing it into the standard score and a guiding term for target image generation. For the gradient computation about the guiding term, we assume a Gaussian distribution for the posterior distribution and estimate its mean and variance to adjust the gradient without additional training. In addition, to improve the quality of the conditional score guidance, we incorporate a simple yet effective mixup technique, which combines two cross-attention maps derived from the source and target latents. This strategy is effective for promoting a desirable fusion of the invariant parts in the source image and the edited regions aligned with the target prompt, leading to high-fidelity target image generation. Through comprehensive experiments, we demonstrate that our approach achieves outstanding image-to-image translation performance on various tasks. Code is available at https://github.com/Hleephilip/CSG.
引用
收藏
页数:24
相关论文
共 50 条
  • [31] UniTune: Text-Driven Image Editing by Fine Tuning a Diffusion Model on a Single Image
    Valevski, Dani
    Kalman, Matan
    Molad, Eyal
    Segalis, Eyal
    Matias, Yossi
    Leviathan, Yaniv
    ACM TRANSACTIONS ON GRAPHICS, 2023, 42 (04):
  • [32] Text2Human: Text-Driven Controllable Human Image Generation
    Jiang, Yuming
    Yang, Shuai
    Qju, Haonan
    Wu, Wayne
    Loy, Chen Change
    Liu, Ziwei
    ACM TRANSACTIONS ON GRAPHICS, 2022, 41 (04):
  • [33] Describe What to Change: A Text-guided Unsupervised Image-to-Image Translation Approach
    Liu, Yahui
    De Nadai, Marco
    Cai, Deng
    Li, Huayang
    Alameda-Pineda, Xavier
    Sebe, Nicu
    Lepri, Bruno
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 1357 - 1365
  • [34] A novel framework for image-to-image translation and image compression
    Yang, Fei
    Wang, Yaxing
    Herranz, Luis
    Cheng, Yongmei
    Mozerov, Mikhail G.
    NEUROCOMPUTING, 2022, 508 : 58 - 70
  • [35] Guided Image Weathering using Image-to-Image Translation
    Chen, Yu
    Shen, I-Chao
    Chen, Bing-Yu
    PROCEEDINGS OF SIGGRAPH ASIA 2021 TECHNICAL COMMUNICATIONS, 2021,
  • [36] Correction to: Generative image completion with image-to-image translation
    Shuzhen Xu
    Qing Zhu
    Jin Wang
    Neural Computing and Applications, 2020, 32 : 17809 - 17809
  • [37] SDDM: Score-Decomposed Diffusion Models on Manifolds for Unpaired Image-to-Image Translation
    Sun, Shikun
    Wei, Longhui
    Xing, Junliang
    Jia, Jia
    Tian, Qi
    arXiv, 2023,
  • [38] Unsupervised Image-to-Image Translation with Generative Prior
    Yang, Shuai
    Jiang, Liming
    Liu, Ziwei
    Loy, Chen Change
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 18311 - 18320
  • [39] Leveraging Local Domains for Image-to-Image Translation
    Dell'Eva, Anthony
    Pizzati, Fabio
    Bertozzi, Massimo
    de Charette, Raoul
    PROCEEDINGS OF THE 17TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS (VISAPP), VOL 5, 2022, : 179 - 189
  • [40] Conditional image-to-image translation generative adversarial network (cGAN) for fabric defect data augmentation
    Mohammed, Swash Sami
    Clarke, Hülya Gökalp
    Neural Computing and Applications, 2024, 36 (32) : 20231 - 20244