BLIP-Diffusion: Pre-trained Subject Representation for Controllable Text-to-Image Generation and Editing

被引:0
|
作者
Li, Dongxu [1 ]
Li, Junnan [1 ]
Hoi, Steven C. H. [1 ]
机构
[1] Salesforce AI Res, Sydney, NSW, Australia
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Subject-driven text-to-image generation models create novel renditions of an input subject based on text prompts. Existing models suffer from lengthy fine-tuning and difficulties preserving the subject fidelity. To overcome these limitations, we introduce BLIP-Diffusion, a new subject-driven image generation model that supports multimodal control which consumes inputs of subject images and text prompts. Unlike other subject-driven generation models, BLIP-Diffusion introduces a new multimodal encoder which is pre-trained to provide subject representation. We first pre-train the multimodal encoder following BLIP-2 to produce visual representation aligned with the text. Then we design a subject representation learning task which enables a diffusion model to leverage such visual representation and generates new subject renditions. Compared with previous methods such as DreamBooth, our model enables zero-shot subject-driven generation, and efficient fine-tuning for customized subject with up to 20x speedup. We also show that BLIP-Diffusion can be flexibly combined with existing techniques such as ControlNet and prompt-toprompt to enable novel subject-driven generation and editing applications.
引用
收藏
页数:21
相关论文
共 50 条
  • [21] Automatic Title Generation for Text with Pre-trained Transformer Language Model
    Mishra, Prakhar
    Diwan, Chaitali
    Srinivasa, Srinath
    Srinivasaraghavan, G.
    2021 IEEE 15TH INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC 2021), 2021, : 17 - 24
  • [22] HOIEdit: Human-object interaction editing with text-to-image diffusion model
    Xu, Tang
    Wang, Wenbin
    Zhong, Alin
    VISUAL COMPUTER, 2025,
  • [23] BioBERT: a pre-trained biomedical language representation model for biomedical text mining
    Lee, Jinhyuk
    Yoon, Wonjin
    Kim, Sungdong
    Kim, Donghyeon
    Kim, Sunkyu
    So, Chan Ho
    Kang, Jaewoo
    BIOINFORMATICS, 2020, 36 (04) : 1234 - 1240
  • [24] Controllable Generation from Pre-trained Language Models via Inverse Prompting
    Zou, Xu
    Yin, Da
    Zhong, Qingyang
    Yang, Hongxia
    Yang, Zhilin
    Tang, Jie
    KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2021, : 2450 - 2460
  • [25] Survey on leveraging pre-trained generative adversarial networks for image editing and restoration
    Liu, Ming
    Wei, Yuxiang
    Wu, Xiaohe
    Zuo, Wangmeng
    Zhang, Lei
    SCIENCE CHINA-INFORMATION SCIENCES, 2023, 66 (05)
  • [26] FinBERT: A Pre-trained Financial Language Representation Model for Financial Text Mining
    Liu, Zhuang
    Huang, Degen
    Huang, Kaiyu
    Li, Zhuang
    Zhao, Jun
    PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 4513 - 4519
  • [27] Survey on leveraging pre-trained generative adversarial networks for image editing and restoration
    Ming LIU
    Yuxiang WEI
    Xiaohe WU
    Wangmeng ZUO
    Lei ZHANG
    ScienceChina(InformationSciences), 2023, 66 (05) : 28 - 55
  • [28] PAIR: Planning and Iterative Refinement in Pre-trained Transformers for Long Text Generation
    Hua, Xinyu
    Wang, Lu
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 781 - 793
  • [29] Attribute Alignment: Controlling Text Generation from Pre-trained Language Models
    Yu, Dian
    Yu, Zhou
    Sagae, Kenji
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 2251 - 2268
  • [30] Comparative Review of Text-to-Image Generation Techniques Based on Diffusion Models
    Gao, Xinyu
    Du, Fang
    Song, Lijuan
    Computer Engineering and Applications, 2024, 60 (24) : 44 - 64