StyleGAN-NADA: CLIP-Guided Domain Adaptation of Image Generators

被引:208
|
作者
Gal, Rinon [1 ,2 ]
Patashnik, Or [1 ]
Maron, Haggai [2 ]
Bermano, Amit H. [1 ]
Chechik, Gal [2 ]
Cohen-Or, Daniel [1 ]
机构
[1] Tel Aviv Univ, Tel Aviv, Israel
[2] NVIDIA, Tel Aviv, Israel
来源
ACM TRANSACTIONS ON GRAPHICS | 2022年 / 41卷 / 04期
关键词
Generator Domain Adaptation; Text-Guided Content Generation; Zero-Shot Training;
D O I
10.1145/3528223.3530164
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Can a generative model be trained to produce images from a specific domain, guided only by a text prompt, without seeing any image? In other words: can an image generator be trained "blindly"? Leveraging the semantic power of large scale Contrastive-Language-Image-Pre-training (CLIP) models, we present a text-driven method that allows shifting a generative model to new domains, without having to collect even a single image. We show that through natural language prompts and a few minutes of training, our method can adapt a generator across a multitude of domains characterized by diverse styles and shapes. Notably, many of these modifications would be difficult or infeasible to reach with existing methods. We conduct an extensive set of experiments across a wide range of domains. These demonstrate the effectiveness of our approach, and show that our models preserve the latent-space structure that makes generative models appealing for downstream tasks. Code and videos available at: stylegan-nada.github.io/
引用
收藏
页数:13
相关论文
共 25 条
  • [1] StyleGAN-based CLIP-guided Image Shape Manipulation
    Qian, Yuchen
    Yamamoto, Kohei
    Yanai, Keiji
    19TH INTERNATIONAL CONFERENCE ON CONTENT-BASED MULTIMEDIA INDEXING, CBMI 2022, 2022, : 162 - 166
  • [2] CLIP-guided black-box domain adaptation of image classification
    Tian, Liang
    Ye, Mao
    Zhou, Lihua
    He, Qichen
    SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (05) : 4637 - 4646
  • [3] CLIP-guided StyleGAN Inversion for Text-driven Real Image Editing
    Baykal, Ahmet Canberk
    Anees, Abdul Basit
    Ceylan, Duygu
    Erdem, Erkut
    Erdem, Aykut
    Yuret, Deniz
    ACM TRANSACTIONS ON GRAPHICS, 2023, 42 (05):
  • [4] Image-Based CLIP-Guided Essence Transfer
    Chefer, Hila
    Benaim, Sagie
    Paiss, Roni
    Wolf, Lior
    COMPUTER VISION, ECCV 2022, PT XIII, 2022, 13673 : 695 - 711
  • [5] RAVE: Residual Vector Embedding for CLIP-Guided Backlit Image Enhancement
    Gaintseva, Tatiana
    Benning, Martin
    Slabaugh, Gregory
    COMPUTER VISION - ECCV 2024, PT LXXIX, 2025, 15137 : 412 - 428
  • [6] CgT-GAN: CLIP-guided Text GAN for Image Captioning
    Yu, Jiarui
    Li, Haoran
    Hao, Yanbin
    Zhu, Bin
    Xu, Tong
    He, Xiangnan
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 2252 - 2263
  • [7] Text Guided Facial Image Synthesis Using StyleGAN and Variational Autoencoder Trained CLIP
    Srinivasa, Anagha
    Praveen, Anjali
    Mavathur, Anusha
    Pothumarthi, Apurva
    Arya, Arti
    Agarwal, Pooja
    ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, ICAISC 2023, PT II, 2023, 14126 : 78 - 90
  • [8] On Mitigating Stability-Plasticity Dilemma in CLIP-guided Image Morphing via Geodesic Distillation Loss
    Oh, Yeongtak
    Lee, Saehyung
    Hwang, Uiwon
    Yoon, Sungroh
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, : 2721 - 2751
  • [9] Adversarial domain adaptation with CLIP for few-shot image classification
    Sun, Tongfeng
    Yang, Hongjian
    Li, Zhongnian
    Xu, Xinzheng
    Wang, Xiurui
    APPLIED INTELLIGENCE, 2025, 55 (01)
  • [10] CMMF-Net: a generative network based on CLIP-guided multi-modal feature fusion for thermal infrared image colorization
    Jiang, Qian
    Zhou, Tao
    He, Youwei
    Ma, Wenjun
    Hou, Jingyu
    Ghani, Ahmad Shahrizan Abdul
    Miao, Shengfa
    Jin, Xin
    INTELLIGENCE & ROBOTICS, 2025, 5 (01): : 34 - 49