Multi-Region Text-Driven Manipulation of Diffusion Imagery

被引:0
|
作者
Li, Yiming [1 ,2 ]
Zhou, Peng [3 ]
Sun, Jun [1 ]
Xu, Yi [1 ,2 ]
机构
[1] Shanghai Jiao Tong Univ, Shanghai Key Lab Digital Media Proc & Transmiss, Shanghai, Peoples R China
[2] Shanghai Jiao Tong Univ, AI Inst, MoE, Key Lab Artificial Intelligence, Shanghai, Peoples R China
[3] China Mobile Suzhou Software Technol Co Ltd, Suzhou, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text-guided image manipulation has attracted significant attention recently. Prevailing techniques concentrate on image attribute editing for individual objects, however, encountering challenges when it comes to multi-object editing. The main reason is the lack of consistency constraints on the spatial layout. This work presents a multi-region guided image manipulation framework, enabling manipulation through region-level textual prompts. With MultiDiffusion as a baseline, we are dedicated to the automatic generation of a rational multi-object spatial distribution, where disparate regions are fused as a unified entity. To mitigate interference from regional fusion, we employ an off-the-shelf model (CLIP) to impose region-aware spatial guidance on multi-object manipulation. Moreover, when applied to the StableDiffusion, the presence of quality-related yet object-agnostic lengthy words hampers the manipulation. To ensure focus on meaningful object-specific words for efficient guidance and generation, we introduce a keyword selection method. Furthermore, we demonstrate a downstream application of our method for multi-region inversion, which is tailored for manipulating multiple objects in real images. Our approach, compatible with variants of Stable Diffusion models, is readily applicable for manipulating diverse objects in extensive images with high-quality generation, showing superb image control capabilities. Code is available at https://github.com/liyiming09/multi-region-guided-diffusion.
引用
收藏
页码:3261 / 3269
页数:9
相关论文
共 50 条
  • [21] Multi-Modal Representation Learning with Text-Driven Soft Masks
    Park, Jaeyoo
    Han, Bohyung
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 2798 - 2807
  • [22] The Framework of Text-driven Business Intelligence
    Zhou, Ning
    Cheng, Hongli
    Chen, Hongqin
    Xiao, Shuang
    2007 INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, NETWORKING AND MOBILE COMPUTING, VOLS 1-15, 2007, : 5468 - 5471
  • [23] CLIPTexture: Text-driven Texture Synthesis
    Song, Yiren
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 5468 - 5476
  • [24] Text-Driven Separation of Arbitrary Sounds
    Kilgour, Kevin
    Gfeller, Beat
    Huang, Qingqing
    Jansen, Aren
    Wisdom, Scott
    Tagliasacchi, Marco
    INTERSPEECH 2022, 2022, : 5403 - 5407
  • [25] LGTM: Local-to-Global Text-Driven Human Motion Diffusion Model
    Sun, Haowen
    Zheng, Ruikun
    Huang, Haibin
    Ma, Chongyang
    Huang, Hui
    Hu, Ruizhen
    PROCEEDINGS OF SIGGRAPH 2024 CONFERENCE PAPERS, 2024,
  • [26] Prompt Tuning Inversion for Text-Driven Image Editing Using Diffusion Models
    Dong, Wenkai
    Xue, Song
    Duan, Xiaoyue
    Han, Shumin
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 7396 - 7406
  • [27] Towards Open Domain Text-Driven Synthesis of Multi-person Motions
    Shan, Mengyi
    Dong, Lu
    Han, Yutao
    Yao, Yuan
    Liu, Tao
    Nwogu, Ifeoma
    Qi, Guo-Jun
    Hill, Mitch
    COMPUTER VISION - ECCV 2024, PT LXV, 2025, 15123 : 67 - 86
  • [28] ControlNeRF: Text-Driven 3D Scene Stylization via Diffusion Model
    Chen, Jiahui
    Yang, Chuanfeng
    Li, Kaiheng
    Wu, Qingqiang
    Hong, Qingqi
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING-ICANN 2024, PT II, 2024, 15017 : 395 - 406
  • [29] PFB-Diff: Progressive Feature Blending diffusion for text-driven image editing
    Huang, Wenjing
    Tu, Shikui
    Xu, Lei
    NEURAL NETWORKS, 2025, 181
  • [30] Enhanced Fine-Grained Motion Diffusion for Text-Driven Human Motion Synthesis
    Wei, Dong
    Sun, Xiaoning
    Sun, Huaijiang
    Hu, Shengxiang
    Li, Bin
    Li, Weiqing
    Lu, Jianfeng
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 6, 2024, : 5876 - 5884