Multi-Region Text-Driven Manipulation of Diffusion Imagery

被引：0

作者：

Li, Yiming ^{[1
,2
]}

Zhou, Peng ^{[3
]}

Sun, Jun ^{[1
]}

Xu, Yi ^{[1
,2
]}

机构：

[1] Shanghai Jiao Tong Univ, Shanghai Key Lab Digital Media Proc & Transmiss, Shanghai, Peoples R China

[2] Shanghai Jiao Tong Univ, AI Inst, MoE, Key Lab Artificial Intelligence, Shanghai, Peoples R China

[3] China Mobile Suzhou Software Technol Co Ltd, Suzhou, Peoples R China

来源：

THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 4 | 2024年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Text-guided image manipulation has attracted significant attention recently. Prevailing techniques concentrate on image attribute editing for individual objects, however, encountering challenges when it comes to multi-object editing. The main reason is the lack of consistency constraints on the spatial layout. This work presents a multi-region guided image manipulation framework, enabling manipulation through region-level textual prompts. With MultiDiffusion as a baseline, we are dedicated to the automatic generation of a rational multi-object spatial distribution, where disparate regions are fused as a unified entity. To mitigate interference from regional fusion, we employ an off-the-shelf model (CLIP) to impose region-aware spatial guidance on multi-object manipulation. Moreover, when applied to the StableDiffusion, the presence of quality-related yet object-agnostic lengthy words hampers the manipulation. To ensure focus on meaningful object-specific words for efficient guidance and generation, we introduce a keyword selection method. Furthermore, we demonstrate a downstream application of our method for multi-region inversion, which is tailored for manipulating multiple objects in real images. Our approach, compatible with variants of Stable Diffusion models, is readily applicable for manipulating diverse objects in extensive images with high-quality generation, showing superb image control capabilities. Code is available at https://github.com/liyiming09/multi-region-guided-diffusion.

引用

页码：3261 / 3269

页数：9

共 50 条

[21] Multi-Modal Representation Learning with Text-Driven Soft Masks
Park, Jaeyoo
Han, Bohyung
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 2798 - 2807
[22] The Framework of Text-driven Business Intelligence
Zhou, Ning
Cheng, Hongli
Chen, Hongqin
Xiao, Shuang
2007 INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, NETWORKING AND MOBILE COMPUTING, VOLS 1-15, 2007, : 5468 - 5471
[23] CLIPTexture: Text-driven Texture Synthesis
Song, Yiren
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 5468 - 5476
[24] Text-Driven Separation of Arbitrary Sounds
Kilgour, Kevin
Gfeller, Beat
Huang, Qingqing
Jansen, Aren
Wisdom, Scott
Tagliasacchi, Marco
INTERSPEECH 2022, 2022, : 5403 - 5407
[25] LGTM: Local-to-Global Text-Driven Human Motion Diffusion Model
Sun, Haowen
Zheng, Ruikun
Huang, Haibin
Ma, Chongyang
Huang, Hui
Hu, Ruizhen
PROCEEDINGS OF SIGGRAPH 2024 CONFERENCE PAPERS, 2024,
[26] Prompt Tuning Inversion for Text-Driven Image Editing Using Diffusion Models
Dong, Wenkai
Xue, Song
Duan, Xiaoyue
Han, Shumin
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 7396 - 7406
[27] Towards Open Domain Text-Driven Synthesis of Multi-person Motions
Shan, Mengyi
Dong, Lu
Han, Yutao
Yao, Yuan
Liu, Tao
Nwogu, Ifeoma
Qi, Guo-Jun
Hill, Mitch
COMPUTER VISION - ECCV 2024, PT LXV, 2025, 15123 : 67 - 86
[28] ControlNeRF: Text-Driven 3D Scene Stylization via Diffusion Model
Chen, Jiahui
Yang, Chuanfeng
Li, Kaiheng
Wu, Qingqiang
Hong, Qingqi
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING-ICANN 2024, PT II, 2024, 15017 : 395 - 406
[29] PFB-Diff: Progressive Feature Blending diffusion for text-driven image editing
Huang, Wenjing
Tu, Shikui
Xu, Lei
NEURAL NETWORKS, 2025, 181
[30] Enhanced Fine-Grained Motion Diffusion for Text-Driven Human Motion Synthesis
Wei, Dong
Sun, Xiaoning
Sun, Huaijiang
Hu, Shengxiang
Li, Bin
Li, Weiqing
Lu, Jianfeng
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 6, 2024, : 5876 - 5884

← 1 2 3 4 5 →