Controlling Text-to-Image Diffusion by Orthogonal Finetuning

被引:0
|
作者
Qiu, Zeju [1 ]
Liu, Weiyang [1 ,2 ]
Feng, Haiwen [1 ]
Xue, Yuxuan [3 ]
Feng, Yao [1 ]
Liu, Zhen [1 ,4 ]
Zhang, Dan [3 ,5 ]
Weller, Adrian [2 ,6 ]
Schoelkopf, Bernhard [1 ]
机构
[1] MPI Intelligent Syst Tubingen, Tubingen, Germany
[2] Univ Cambridge, Cambridge, England
[3] Univ Tubingen, Tubingen, Germany
[4] Univ Montreal, Mila, Montreal, PQ, Canada
[5] Bosch Ctr Artificial Intelligence, Renningen, Germany
[6] Alan Turing Inst, London, England
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large text-to-image diffusion models have impressive capabilities in generating photorealistic images from text prompts. How to effectively guide or control these powerful models to perform different downstream tasks becomes an important open problem. To tackle this challenge, we introduce a principled finetuning method - Orthogonal Finetuning (OFT), for adapting text-to-image diffusion models to downstream tasks. Unlike existing methods, OFT can provably preserve hyper-spherical energy which characterizes the pairwise neuron relationship on the unit hypersphere. We find that this property is crucial for preserving the semantic generation ability of text-to-image diffusion models. To improve finetuning stability, we further propose Constrained Orthogonal Finetuning (COFT) which imposes an additional radius constraint to the hypersphere. Specifically, we consider two important finetuning text-to-image tasks: subject-driven generation where the goal is to generate subject-specific images given a few images of a subject and a text prompt, and controllable generation where the goal is to enable the model to take in additional control signals. We empirically show that our OFT framework outperforms existing methods in generation quality and convergence speed.
引用
收藏
页数:43
相关论文
共 50 条
  • [31] Exposing fake images generated by text-to-image diffusion models
    Xu, Qiang
    Wang, Hao
    Meng, Laijin
    Mi, Zhongjie
    Yuan, Jianye
    Yan, Hong
    PATTERN RECOGNITION LETTERS, 2023, 176 : 76 - 82
  • [32] Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding
    Saharia, Chitwan
    Chan, William
    Saxena, Saurabh
    Li, Lala
    Whang, Jay
    Denton, Emily
    Ghasemipour, Seyed Kamyar Seyed
    Ayan, Burcu Karagol
    Mahdavi, S. Sara
    Gontijo-Lopes, Raphael
    Salimans, Tim
    Ho, Jonathan
    Fleet, David J.
    Norouzi, Mohammad
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [33] Deep Reward Supervisions for Tuning Text-to-Image Diffusion Models
    Wu, Xiaoshi
    Hao, Yiming
    Zhang, Manyuan
    Sun, Keqiang
    Huang, Zhaoyang
    Song, Guanglu
    Liu, Yu
    Li, Hongsheng
    COMPUTER VISION - ECCV 2024, PT LXXXIII, 2025, 15141 : 108 - 124
  • [34] Adversarial attacks and defenses on text-to-image diffusion models: A survey
    Zhang, Chenyu
    Hu, Mingwang
    Li, Wenhui
    Wang, Lanjun
    INFORMATION FUSION, 2025, 114
  • [35] Reliable and Efficient Concept Erasure of Text-to-Image Diffusion Models
    Gong, Chao
    Chen, Kai
    Wei, Zhipeng
    Chen, Jingjing
    Jiang, Yu-Gang
    COMPUTER VISION - ECCV 2024, PT LIII, 2025, 15111 : 73 - 88
  • [36] DreamStyler: Paint by Style Inversion with Text-to-Image Diffusion Models
    Ahn, Namhyuk
    Lee, Junsoo
    Lee, Chunggi
    Kim, Kunhee
    Kim, Daesik
    Nam, Seung-Hun
    Hong, Kibeom
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 2, 2024, : 674 - 681
  • [37] Towards Consistent Video Editing with Text-to-Image Diffusion Models
    Zhang, Zicheng
    Li, Bonan
    Nie, Xuecheng
    Han, Congying
    Guo, Tiande
    Liu, Luoqi
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [38] DDIMCACHE: AN ENHANCED TEXT-TO-IMAGE DIFFUSION MODEL ON MOBILE DEVICES
    Wu, Qifeng
    KYBERNETIKA, 2024, 60 (06) : 819 - 833
  • [39] A photo cartoonization method based on text-to-image diffusion model
    Jeon, Hwyjoon
    Shim, Jonghwa
    Kim, Hyeonwoo
    Hwang, Eenjun
    NEUROCOMPUTING, 2025, 620
  • [40] Safeguard Text-to-Image Diffusion Models with Human Feedback Inversion
    Jung, Sanghyun
    Jung, Seohyeon
    Kim, Balhae
    Choi, Moonseok
    Shin, Jinwoo
    Lee, Juho
    COMPUTER VISION - ECCV 2024, PT LXVII, 2025, 15125 : 128 - 145