Locally controllable network based on visual–linguistic relation alignment for text-to-image generation

被引:0
|
作者
Zaike Li
Li Liu
Huaxiang Zhang
Dongmei Liu
Yu Song
Boqun Li
机构
[1] Shandong Normal University,School of Information Science and Engineering
来源
Multimedia Systems | 2024年 / 30卷
关键词
Text-to-image generation; Image-text matching; Generative adversarial network; Local control;
D O I
暂无
中图分类号
学科分类号
摘要
Since locally controllable text-to-image generation cannot achieve satisfactory results in detail, a novel locally controllable text-to-image generation network based on visual–linguistic relation alignment is proposed. The goal of the method is to complete image processing and generation semantically through text guidance. The proposed method explores the relationship between text and image to achieve local control of text-to-image generation. The visual–linguistic matching learns the similarity weights between image and text through semantic features to achieve the fine-grained correspondence between local images and words. The instance-level optimization function is introduced into the generation process to accurately control the weight with low similarity and combine with text features to generate new visual attributes. In addition, a local control loss is proposed to preserve the details of the text and local regions of the image. Extensive experiments demonstrate the superior performance of the proposed method and enable more accurate control of the original image.
引用
收藏
相关论文
共 50 条
  • [31] Dense Text-to-Image Generation with Attention Modulation
    Kim, Yunji
    Lee, Jiyoung
    Kim, Jin-Hwa
    Ha, Jung-Woo
    Zhu, Jun-Yan
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 7667 - 7677
  • [32] MirrorGAN: Learning Text-to-image Generation by Redescription
    Qiao, Tingting
    Zhang, Jing
    Xu, Duanqing
    Tao, Dacheng
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 1505 - 1514
  • [33] TextControlGAN: Text-to-Image Synthesis with Controllable Generative Adversarial Networks
    Ku, Hyeeun
    Lee, Minhyeok
    APPLIED SCIENCES-BASEL, 2023, 13 (08):
  • [34] StyleDrop: Text-to-Image Generation in Any Style
    Sohn, Kihyuk
    Ruiz, Nataniel
    Lee, Kimin
    Chin, Daniel Castro
    Blok, Irina
    Chang, Huiwen
    Barber, Jarred
    Jiang, Lu
    Entis, Glenn
    Li, Yuanzhen
    Hao, Yuan
    Essa, Irfan
    Rubinstein, Michael
    Krishnan, Dilip
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [35] A taxonomy of prompt modifiers for text-to-image generation
    Oppenlaender, Jonas
    BEHAVIOUR & INFORMATION TECHNOLOGY, 2024, 43 (15) : 3763 - 3776
  • [36] Multi-Semantic Fusion Generative Adversarial Network for Text-to-Image Generation
    Huang, Pingda
    Liu, Yedan
    Fu, Chunjiang
    Zhao, Liang
    2023 IEEE 8TH INTERNATIONAL CONFERENCE ON BIG DATA ANALYTICS, ICBDA, 2023, : 159 - 164
  • [37] Text-to-Image Generation Method Based on Object Enhancement and Attention Maps
    Huang, Yongsen
    Cai, Xiaodong
    An, Yuefan
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2025, 16 (01) : 961 - 968
  • [38] Comparative Review of Text-to-Image Generation Techniques Based on Diffusion Models
    Gao, Xinyu
    Du, Fang
    Song, Lijuan
    Computer Engineering and Applications, 2024, 60 (24) : 44 - 64
  • [39] Artificial Intelligence-based Text-to-Image Generation of Cardiac CT
    Williams, Michelle C.
    Williams, Steven E.
    Newby, David E.
    RADIOLOGY-CARDIOTHORACIC IMAGING, 2023, 5 (02):
  • [40] Large-scale Text-to-Image Generation Models for Visual Artists' Creative Works
    Ko, Hyung-Kwon
    Park, Gwanmo
    Jeon, Hyeon
    Jo, Jaemin
    Kim, Juho
    Seo, Jinwook
    PROCEEDINGS OF 2023 28TH ANNUAL CONFERENCE ON INTELLIGENT USER INTERFACES, IUI 2023, 2023, : 919 - 933