Locally controllable network based on visual–linguistic relation alignment for text-to-image generation

被引:0
|
作者
Zaike Li
Li Liu
Huaxiang Zhang
Dongmei Liu
Yu Song
Boqun Li
机构
[1] Shandong Normal University,School of Information Science and Engineering
来源
Multimedia Systems | 2024年 / 30卷
关键词
Text-to-image generation; Image-text matching; Generative adversarial network; Local control;
D O I
暂无
中图分类号
学科分类号
摘要
Since locally controllable text-to-image generation cannot achieve satisfactory results in detail, a novel locally controllable text-to-image generation network based on visual–linguistic relation alignment is proposed. The goal of the method is to complete image processing and generation semantically through text guidance. The proposed method explores the relationship between text and image to achieve local control of text-to-image generation. The visual–linguistic matching learns the similarity weights between image and text through semantic features to achieve the fine-grained correspondence between local images and words. The instance-level optimization function is introduced into the generation process to accurately control the weight with low similarity and combine with text features to generate new visual attributes. In addition, a local control loss is proposed to preserve the details of the text and local regions of the image. Extensive experiments demonstrate the superior performance of the proposed method and enable more accurate control of the original image.
引用
收藏
相关论文
共 50 条
  • [41] LFR-GAN: Local Feature Refinement based Generative Adversarial Network for Text-to-Image Generation
    Deng, Zijun
    He, Xiangteng
    Peng, Yuxin
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (06)
  • [42] Exploring Global and Local Linguistic Representations for Text-to-Image Synthesis
    Li, Ruifan
    Wang, Ning
    Feng, Fangxiang
    Zhang, Guangwei
    Wang, Xiaojie
    IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (12) : 3075 - 3087
  • [43] Modern Neural Network Technologies Text-to-Image
    Bondareva N.A.
    Scientific Visualization, 2023, 15 (02): : 66 - 79
  • [44] Generative adversarial text-to-image generation with style image constraint
    Zekang Wang
    Li Liu
    Huaxiang Zhang
    Dongmei Liu
    Yu Song
    Multimedia Systems, 2023, 29 : 3291 - 3303
  • [45] DTIA: Disruptive Text-Image Alignment for Countering Text-to-Image Diffusion Model Personalization
    Gao, Ya
    Yang, Jing
    Wu, Minghui
    Zhao, Chenxu
    Su, Anyang
    Song, Jie
    Yu, Zitong
    DATA SCIENCE AND ENGINEERING, 2025, 10 (01) : 12 - 23
  • [46] Generative adversarial text-to-image generation with style image constraint
    Wang, Zekang
    Liu, Li
    Zhang, Huaxiang
    Liu, Dongmei
    Song, Yu
    MULTIMEDIA SYSTEMS, 2023, 29 (06) : 3291 - 3303
  • [47] BLIP-Diffusion: Pre-trained Subject Representation for Controllable Text-to-Image Generation and Editing
    Li, Dongxu
    Li, Junnan
    Hoi, Steven C. H.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [48] Unleashing Text-to-Image Diffusion Models for Visual Perception
    Zhao, Wenliang
    Rao, Yongming
    Liu, Zuyan
    Liu, Benlin
    Zhou, Jie
    Lu, Jiwen
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 5706 - 5716
  • [49] A survey on generative adversarial network-based text-to-image synthesis
    Zhou, Rui
    Jiang, Cong
    Xu, Qingyang
    NEUROCOMPUTING, 2021, 451 : 316 - 336
  • [50] Improving text-to-image generation with object layout guidance
    Jezia Zakraoui
    Moutaz Saleh
    Somaya Al-Maadeed
    Jihad Mohammed Jaam
    Multimedia Tools and Applications, 2021, 80 : 27423 - 27443