Text-Guided Multi-region Scene Image Editing Based on Diffusion Model

被引:0
|
作者
Li, Ruichen [1 ]
Wu, Lei [1 ]
Wang, Changshuo [1 ]
Dong, Pei [1 ]
Li, Xin [1 ]
机构
[1] Shandong Univ, Jinan, Peoples R China
关键词
Text-guided image editing; Diffusion model; Image manipulation;
D O I
10.1007/978-981-97-5612-4_20
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The tremendous progress in neural image generation, coupled with the emergence of seemingly omnipotent vision-language models have finally enabled text-guided editing realistic scene images. The latest works utilize diffusion models and most studies focus on editing individual regions based on a given text prompt. When the user delineates multiple regions, these models cannot edit in the corresponding areas based on different text semantics. Hence, we propose a new diffusion-based text-guided multi-region scene image editing model, which can handle multiple regions and corresponding text, and focus on entity-level object editing and layout-level background coordination at different denoising steps respectively. At the early steps of the denoising, we propose a mask dilation based object editing method that dilates thinner masks to ensure the accuracy of editing multiple objects. In layout-level background coordination, we not only encourage the noisy version of the original scene image to replace the random noise in the background region in the diffusion reversion process, but also propose Outward Low-pass Filtering (OutwardLPF) to eliminate the sharp transitions of noise levels between edited image regions. We conduct extensive experiments showing that our model outperforms all baselines in terms of multi-object entity editing and background coordination.
引用
收藏
页码:229 / 240
页数:12
相关论文
共 50 条
  • [41] TUMSyn: A Text-Guided Generalist Model for Customized Multimodal MR Image Synthesis
    Wang, Yulin
    Xiong, Honglin
    Xie, Yi
    Liu, Jiameng
    Wang, Qian
    Liu, Qian
    Shen, Dinggang
    FOUNDATION MODELS FOR GENERAL MEDICAL AI, MEDAGI 2024, 2025, 15184 : 124 - 133
  • [42] Multi-Region Level Set Image Segmentation Based on Image Energy Separation Model
    Yin, Xue-Min
    Yan, Hong
    Yao, Yu-Hua
    Guo, Jian-Ping
    Zhong, Chong-Fa
    Zhang, Zhe
    Wei, Yi
    FIFTH INTERNATIONAL CONFERENCE ON DIGITAL IMAGE PROCESSING (ICDIP 2013), 2013, 8878
  • [43] Learning semantic alignment from image for text-guided image inpainting
    Yucheng Xie
    Zehang Lin
    Zhenguo Yang
    Huan Deng
    Xingcai Wu
    Xudong Mao
    Qing Li
    Wenyin Liu
    The Visual Computer, 2022, 38 : 3149 - 3161
  • [44] Hardware Resilience Properties of Text-Guided Image Classifiers
    Wasim, Syed Talal
    Soboka, Kabila Haile
    Mahmoud, Abdulrahman
    Khan, Salman
    Brooks, David
    Wei, Gu-Yeon
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [45] DiCTI: Diffusion-based Clothing Designer via Text-guided Input
    Lampe, Ajda
    Stopar, Julija
    Jain, Deepak K.
    Omachi, Shinichiro
    Peer, Peter
    Struc, Vitomir
    2024 IEEE 18TH INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION, FG 2024, 2024,
  • [46] Target-Free Text-Guided Image Manipulation
    Fan, Wan-Cyuan
    Yang, Cheng-Fu
    Yang, Chiao-An
    Wang, Yu-Chiang Frank
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 1, 2023, : 588 - 596
  • [47] SEGMENTATION-AWARE TEXT-GUIDED IMAGE MANIPULATION
    Haruyama, Tomoki
    Togo, Ren
    Maeda, Keisuke
    Ogawa, Takahiro
    Haseyama, Miki
    2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 2433 - 2437
  • [48] AMC: Adaptive Multi-expert Collaborative Network for Text-guided Image Retrieval
    Zhu, Hongguang
    Wei, Yunchao
    Zhao, Yao
    Zhang, Chunjie
    Huang, Shujuan
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (06)
  • [49] FusionDeformer: text-guided mesh deformation using diffusion models
    Xu, Hao
    Wu, Yiqian
    Tang, Xiangjun
    Zhang, Jing
    Zhang, Yang
    Zhang, Zhebin
    Li, Chen
    Jin, Xiaogang
    VISUAL COMPUTER, 2024, 40 (07): : 4701 - 4712
  • [50] Manipulation Direction: Evaluating Text-Guided Image Manipulation Based on Similarity between Changes in Image and Text Modalities
    Watanabe, Yuto
    Togo, Ren
    Maeda, Keisuke
    Ogawa, Takahiro
    Haseyama, Miki
    SENSORS, 2023, 23 (22)