Text-Guided Multi-region Scene Image Editing Based on Diffusion Model

被引:0
|
作者
Li, Ruichen [1 ]
Wu, Lei [1 ]
Wang, Changshuo [1 ]
Dong, Pei [1 ]
Li, Xin [1 ]
机构
[1] Shandong Univ, Jinan, Peoples R China
关键词
Text-guided image editing; Diffusion model; Image manipulation;
D O I
10.1007/978-981-97-5612-4_20
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The tremendous progress in neural image generation, coupled with the emergence of seemingly omnipotent vision-language models have finally enabled text-guided editing realistic scene images. The latest works utilize diffusion models and most studies focus on editing individual regions based on a given text prompt. When the user delineates multiple regions, these models cannot edit in the corresponding areas based on different text semantics. Hence, we propose a new diffusion-based text-guided multi-region scene image editing model, which can handle multiple regions and corresponding text, and focus on entity-level object editing and layout-level background coordination at different denoising steps respectively. At the early steps of the denoising, we propose a mask dilation based object editing method that dilates thinner masks to ensure the accuracy of editing multiple objects. In layout-level background coordination, we not only encourage the noisy version of the original scene image to replace the random noise in the background region in the diffusion reversion process, but also propose Outward Low-pass Filtering (OutwardLPF) to eliminate the sharp transitions of noise levels between edited image regions. We conduct extensive experiments showing that our model outperforms all baselines in terms of multi-object entity editing and background coordination.
引用
收藏
页码:229 / 240
页数:12
相关论文
共 50 条
  • [31] Text-Guided Customizable Image Synthesis and Manipulation
    Zhang, Zhiqiang
    Fu, Chen
    Weng, Wei
    Zhou, Jinjia
    APPLIED SCIENCES-BASEL, 2022, 12 (20):
  • [32] Text-guided Unsupervised Latent Transformation for Multi-attribute Image Manipulation
    Wei, Xiwen
    Xu, Zhen
    Liu, Cheng
    Wu, Si
    Yu, Zhiwen
    Wong, Hau San
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 19285 - 19294
  • [33] LETTER EMBEDDING GUIDANCE DIFFUSION MODEL FOR SCENE TEXT EDITING
    Wang, Changshuo
    Wu, Lei
    Chen, Xu
    Li, Xiang
    Meng, Lei
    Meng, Xiangxu
    2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 588 - 593
  • [34] Advances in text-guided 3D editing: a survey
    Lu, Lihua
    Li, Ruyang
    Zhang, Xiaohui
    Wei, Hui
    Du, Guoguang
    Wang, Binqiang
    ARTIFICIAL INTELLIGENCE REVIEW, 2024, 57 (12)
  • [35] MaskDiffuse: Text-Guided Face Mask Removal Based on Diffusion Models
    Lu, Jingxia
    Hou, Xianxu
    Li, Hao
    Peng, Zhibin
    Shen, Linlin
    Fan, Lixin
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT VI, 2024, 14430 : 435 - 446
  • [36] WAVELET-GUIDED ACCELERATION OF TEXT INVERSION IN DIFFUSION-BASED IMAGE EDITING
    Koo, Gwanhyeong
    Yoon, Sunjae
    Yoo, Chang D.
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 4380 - 4384
  • [37] Text-Guided Sketch-to-Photo Image Synthesis
    Osahor, Uche
    Nasrabadi, Nasser M.
    IEEE ACCESS, 2022, 10 : 98278 - 98289
  • [38] TexFusion: Synthesizing 3D Textures with Text-Guided Image Diffusion Models
    Cao, Tianshi
    Kreis, Karsten
    Fidler, Sanja
    Sharp, Nicholas
    Yin, Kangxue
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 4146 - 4158
  • [39] Zero-Shot Contrastive Loss for Text-Guided Diffusion Image Style Transfer
    Yang, Serin
    Hwang, Hyunmin
    Ye, Jong Chul
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 22816 - 22825
  • [40] Enhancing Label-Efficient Medical Image Segmentation with Text-Guided Diffusion Models
    Feng, Chun-Mei
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT VIII, 2024, 15008 : 253 - 262