Conditional Score Guidance for Text-Driven Image-to-Image Translation

被引:0
|
作者
Lee, Hyunsoo [1 ]
Kang, Minsoo [1 ]
Han, Bohyung [1 ,2 ]
机构
[1] Seoul Natl Univ, ECE, Seoul, South Korea
[2] Seoul Natl Univ, IPAI, Seoul, South Korea
基金
新加坡国家研究基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a novel algorithm for text-driven image-to-image translation based on a pretrained text-to-image diffusion model. Our method aims to generate a target image by selectively editing regions of interest in a source image, defined by a modifying text, while preserving the remaining parts. In contrast to existing techniques that solely rely on a target prompt, we introduce a new score function that additionally considers both the source image and the source text prompt, tailored to address specific translation tasks. To this end, we derive the conditional score function in a principled way, decomposing it into the standard score and a guiding term for target image generation. For the gradient computation about the guiding term, we assume a Gaussian distribution for the posterior distribution and estimate its mean and variance to adjust the gradient without additional training. In addition, to improve the quality of the conditional score guidance, we incorporate a simple yet effective mixup technique, which combines two cross-attention maps derived from the source and target latents. This strategy is effective for promoting a desirable fusion of the invariant parts in the source image and the edited regions aligned with the target prompt, leading to high-fidelity target image generation. Through comprehensive experiments, we demonstrate that our approach achieves outstanding image-to-image translation performance on various tasks. Code is available at https://github.com/Hleephilip/CSG.
引用
收藏
页数:24
相关论文
共 50 条
  • [1] Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation
    Tumanyan, Narek
    Geyer, Michal
    Bagon, Shai
    Dekel, Tali
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 1921 - 1930
  • [2] Conditional Image-to-Image translation
    Lin, Jianxin
    Xia, Yingce
    Qin, Tao
    Chen, Zhibo
    Liu, Tie-Yan
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 5524 - 5532
  • [3] Image-to-Image Translation with Conditional Adversarial Networks
    Isola, Phillip
    Zhu, Jun-Yan
    Zhou, Tinghui
    Efros, Alexei A.
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 5967 - 5976
  • [4] General Image-to-Image Translation with One-Shot Image Guidance
    Cheng, Bin
    Liu, Zuhao
    Peng, Yunbo
    Lin, Yue
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 22679 - 22689
  • [5] Attention-Based Spatial Guidance for Image-to-Image Translation
    Lin, Yu
    Wang, Yigong
    Li, Yifan
    Gao, Yang
    Wang, Zhuoyi
    Khan, Latifur
    2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2021), 2021, : 816 - 825
  • [6] Binary Noise Guidance Learning for Remote Sensing Image-to-Image Translation
    Zhang, Guoqing
    Zhou, Ruixin
    Zheng, Yuhui
    Li, Baozhu
    REMOTE SENSING, 2024, 16 (01)
  • [7] Task-Driven Image-to-Image Translation for Automotive Applications
    Malaescu, Alexandru
    Fratila, Andrei
    Dutu, Liviu Cristian
    Sultana, Alina
    Filip, Dan
    Ciuc, Mihai
    2020 IEEE INTELLIGENT VEHICLES SYMPOSIUM (IV), 2020, : 2126 - 2132
  • [8] Hypercomplex Image-to-Image Translation
    Grassucci, Eleonora
    Sigillo, Luigi
    Uncini, Aurelio
    Comminiello, Danilo
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [9] LANIT: Language-Driven Image-to-Image Translation for Unlabeled Data
    Park, Jihye
    Kim, Sunwoo
    Kim, Soohyun
    Cho, Seokju
    Yoo, Jaejun
    Uh, Youngjung
    Kim, Seungryong
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 23401 - 23411
  • [10] Generative image completion with image-to-image translation
    Shuzhen Xu
    Qing Zhu
    Jin Wang
    Neural Computing and Applications, 2020, 32 : 7333 - 7345