Controlling Text-to-Image Diffusion by Orthogonal Finetuning

被引:0
|
作者
Qiu, Zeju [1 ]
Liu, Weiyang [1 ,2 ]
Feng, Haiwen [1 ]
Xue, Yuxuan [3 ]
Feng, Yao [1 ]
Liu, Zhen [1 ,4 ]
Zhang, Dan [3 ,5 ]
Weller, Adrian [2 ,6 ]
Schoelkopf, Bernhard [1 ]
机构
[1] MPI Intelligent Syst Tubingen, Tubingen, Germany
[2] Univ Cambridge, Cambridge, England
[3] Univ Tubingen, Tubingen, Germany
[4] Univ Montreal, Mila, Montreal, PQ, Canada
[5] Bosch Ctr Artificial Intelligence, Renningen, Germany
[6] Alan Turing Inst, London, England
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large text-to-image diffusion models have impressive capabilities in generating photorealistic images from text prompts. How to effectively guide or control these powerful models to perform different downstream tasks becomes an important open problem. To tackle this challenge, we introduce a principled finetuning method - Orthogonal Finetuning (OFT), for adapting text-to-image diffusion models to downstream tasks. Unlike existing methods, OFT can provably preserve hyper-spherical energy which characterizes the pairwise neuron relationship on the unit hypersphere. We find that this property is crucial for preserving the semantic generation ability of text-to-image diffusion models. To improve finetuning stability, we further propose Constrained Orthogonal Finetuning (COFT) which imposes an additional radius constraint to the hypersphere. Specifically, we consider two important finetuning text-to-image tasks: subject-driven generation where the goal is to generate subject-specific images given a few images of a subject and a text prompt, and controllable generation where the goal is to enable the model to take in additional control signals. We empirically show that our OFT framework outperforms existing methods in generation quality and convergence speed.
引用
收藏
页数:43
相关论文
共 50 条
  • [41] Unleashing Text-to-Image Diffusion Prior for Zero-Shot Image Captioning
    Luol, Jianjie
    Chen, Jingwen
    Li, Yehao
    Pan, Yingwei
    Feng, Jianlin
    Cha, Hongyang
    Yao, Ting
    COMPUTER VISION-ECCV 2024, PT LVII, 2025, 15115 : 237 - 254
  • [42] From text to mask: Localizing entities using the attention of text-to-image diffusion models
    Xiao, Changming
    Yang, Qi
    Zhou, Feng
    Zhang, Changshui
    NEUROCOMPUTING, 2024, 610
  • [43] PromptMix: Text-to-image diffusion models enhance the performance of lightweight networks
    Bakhtiarnia, Arian
    Zhang, Qi
    Iosifidis, Alexandros
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [44] Lost in Translation: Latent Concept Misalignment in Text-to-Image Diffusion Models
    Zhao, Juntu
    Deng, Junyu
    Ye, Yixin
    Li, Chongxuan
    Deng, Zhijie
    Wang, Dequan
    COMPUTER VISION - ECCV 2024, PT LXIX, 2025, 15127 : 318 - 333
  • [45] Point-Cloud Completion with Pretrained Text-to-image Diffusion Models
    Kasten, Yoni
    Rahamim, Ohad
    Chechik, Gal
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [46] Exploring Phrase-Level Grounding with Text-to-Image Diffusion Model
    Yang, Danni
    Dong, Ruohan
    Ji, Jiayi
    Ma, Yiwei
    Wang, Haowei
    Sun, Xiaoshuai
    Ji, Rongrong
    COMPUTER VISION - ECCV 2024, PT LIII, 2025, 15111 : 161 - 180
  • [47] Comparative Review of Text-to-Image Generation Techniques Based on Diffusion Models
    Gao, Xinyu
    Du, Fang
    Song, Lijuan
    Computer Engineering and Applications, 2024, 60 (24) : 44 - 64
  • [48] MagicFusion: Boosting Text-to-Image Generation Performance by Fusing Diffusion Models
    Zhao, Jing
    Zheng, Heliang
    Wang, Chaoyue
    Lan, Long
    Yang, Wenjing
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 22535 - 22545
  • [49] Separate-and-Enhance: Compositional Finetuning for Text2Image Diffusion Models
    Bao, Zhipeng
    Li, Yijun
    Singh, Krishna Kumar
    Wang, Yu-Xiong
    Hebert, Martial
    PROCEEDINGS OF SIGGRAPH 2024 CONFERENCE PAPERS, 2024,
  • [50] Temporal Adaptive Attention Map Guidance for Text-to-Image Diffusion Models
    Jung, Sunghoon
    Heo, Yong Seok
    ELECTRONICS, 2025, 14 (03):