Controlling Text-to-Image Diffusion by Orthogonal Finetuning

被引:0
|
作者
Qiu, Zeju [1 ]
Liu, Weiyang [1 ,2 ]
Feng, Haiwen [1 ]
Xue, Yuxuan [3 ]
Feng, Yao [1 ]
Liu, Zhen [1 ,4 ]
Zhang, Dan [3 ,5 ]
Weller, Adrian [2 ,6 ]
Schoelkopf, Bernhard [1 ]
机构
[1] MPI Intelligent Syst Tubingen, Tubingen, Germany
[2] Univ Cambridge, Cambridge, England
[3] Univ Tubingen, Tubingen, Germany
[4] Univ Montreal, Mila, Montreal, PQ, Canada
[5] Bosch Ctr Artificial Intelligence, Renningen, Germany
[6] Alan Turing Inst, London, England
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large text-to-image diffusion models have impressive capabilities in generating photorealistic images from text prompts. How to effectively guide or control these powerful models to perform different downstream tasks becomes an important open problem. To tackle this challenge, we introduce a principled finetuning method - Orthogonal Finetuning (OFT), for adapting text-to-image diffusion models to downstream tasks. Unlike existing methods, OFT can provably preserve hyper-spherical energy which characterizes the pairwise neuron relationship on the unit hypersphere. We find that this property is crucial for preserving the semantic generation ability of text-to-image diffusion models. To improve finetuning stability, we further propose Constrained Orthogonal Finetuning (COFT) which imposes an additional radius constraint to the hypersphere. Specifically, we consider two important finetuning text-to-image tasks: subject-driven generation where the goal is to generate subject-specific images given a few images of a subject and a text prompt, and controllable generation where the goal is to enable the model to take in additional control signals. We empirically show that our OFT framework outperforms existing methods in generation quality and convergence speed.
引用
收藏
页数:43
相关论文
共 50 条
  • [1] JeDi: Joint-Image Diffusion Models for Finetuning-Free Personalized Text-to-Image Generation
    Zeng, Yu
    Patel, Vishal M.
    Wang, Haochen
    Huang, Xun
    Wang, Ting-Chun
    Liu, Ming-Yu
    Balaji, Yogesh
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 6786 - 6795
  • [2] Masked-attention diffusion guidance for spatially controlling text-to-image generation
    Endo, Yuki
    VISUAL COMPUTER, 2024, 40 (09): : 6033 - 6045
  • [3] Shifted Diffusion for Text-to-image Generation
    Zhou, Yufan
    Liu, Bingchen
    Zhu, Yizhe
    Yang, Xiao
    Chen, Changyou
    Xu, Jinhui
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 10157 - 10166
  • [4] Debiasing Text-to-Image Diffusion Models
    He, Ruifei
    Xue, Chuhui
    Tan, Haoru
    Zhang, Wenqing
    Yu, Yingchen
    Bai, Song
    Qi, Xiaojuan
    PROCEEDINGS OF THE 1ST ACM MULTIMEDIA WORKSHOP ON MULTI-MODAL MISINFORMATION GOVERNANCE IN THE ERA OF FOUNDATION MODELS, MIS 2024, 2024, : 29 - 36
  • [5] Decoupling Control in Text-to-Image Diffusion Models
    Cao, Shitong
    Zhang, Xuejie
    Wang, Jin
    Zhou, Xiaobing
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT VII, ICIC 2024, 2024, 14868 : 312 - 322
  • [6] Parameter efficient finetuning of text-to-image models with trainable self-attention layer
    Li, Zhuoyuan
    Sun, Yi
    IMAGE AND VISION COMPUTING, 2024, 151
  • [7] Ablating Concepts in Text-to-Image Diffusion Models
    Kumari, Nupur
    Zhang, Bingliang
    Wang, Sheng-Yu
    Shechtman, Eli
    Zhang, Richard
    Zhu, Jun-Yan
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 22634 - 22645
  • [8] AltDiffusion: A Multilingual Text-to-Image Diffusion Model
    Ye, Fulong
    Liu, Guang
    Wu, Xinya
    Wu, Ledell
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 7, 2024, : 6648 - 6656
  • [9] SINE: SINgle Image Editing with Text-to-Image Diffusion Models
    Zhang, Zhixing
    Han, Ligong
    Ghosh, Arnab
    Metaxas, Dimitris
    Ren, Jian
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 6027 - 6037
  • [10] Create Your World: Lifelong Text-to-Image Diffusion
    Sun, Gan
    Liang, Wenqi
    Dong, Jiahua
    Li, Jun
    Ding, Zhengming
    Cong, Yang
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (09) : 6454 - 6470