Locally controllable network based on visual–linguistic relation alignment for text-to-image generation

被引:0
|
作者
Zaike Li
Li Liu
Huaxiang Zhang
Dongmei Liu
Yu Song
Boqun Li
机构
[1] Shandong Normal University,School of Information Science and Engineering
来源
Multimedia Systems | 2024年 / 30卷
关键词
Text-to-image generation; Image-text matching; Generative adversarial network; Local control;
D O I
暂无
中图分类号
学科分类号
摘要
Since locally controllable text-to-image generation cannot achieve satisfactory results in detail, a novel locally controllable text-to-image generation network based on visual–linguistic relation alignment is proposed. The goal of the method is to complete image processing and generation semantically through text guidance. The proposed method explores the relationship between text and image to achieve local control of text-to-image generation. The visual–linguistic matching learns the similarity weights between image and text through semantic features to achieve the fine-grained correspondence between local images and words. The instance-level optimization function is introduced into the generation process to accurately control the weight with low similarity and combine with text features to generate new visual attributes. In addition, a local control loss is proposed to preserve the details of the text and local regions of the image. Extensive experiments demonstrate the superior performance of the proposed method and enable more accurate control of the original image.
引用
收藏
相关论文
共 50 条
  • [21] TEXT-TO-IMAGE SYNTHESIS METHOD EVALUATION BASED ON VISUAL PATTERNS
    Sommer, William Lund
    Iosifidis, Alexandros
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 4097 - 4101
  • [22] STRUCTURE-AWARE GENERATIVE ADVERSARIAL NETWORK FOR TEXT-TO-IMAGE GENERATION
    Chen, Wenjie
    Ni, Zhangkai
    Wang, Hanli
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 2075 - 2079
  • [23] Visual-Linguistic Alignment and Composition for Image Retrieval with Text Feedback
    Li, Dafeng
    Zhu, Yingying
    2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 108 - 113
  • [24] ViSAGe: A Global-Scale Analysis of Visual Stereotypes in Text-to-Image Generation
    Jha, Akshita
    Prabhakaran, Vinodkumar
    Denton, Remi
    Laszlo, Sarah
    Dave, Shachi
    Qadri, Rida
    Reddy, Chandan K.
    Dev, Sunipa
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 12333 - 12347
  • [25] ELITE: Encoding Visual Concepts into Textual Embeddings for Customized Text-to-Image Generation
    Wei, Yuxiang
    Zhang, Yabo
    Ji, Zhilong
    Bai, Jinfeng
    Zhang, Lei
    Zuo, Wangmeng
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 15897 - 15907
  • [26] Textual-Visual Logic Challenge: Understanding and Reasoning in Text-to-Image Generation
    Xiong, Peixi
    Kozuch, Michael
    Jain, Nilesh
    COMPUTER VISION - ECCV 2024, PT V, 2025, 15063 : 318 - 334
  • [27] Prompt Refinement with Image Pivot for Text-to-Image Generation
    Zhan, Jingtao
    Ai, Qingyao
    Liu, Yiqun
    Pan, Yingwei
    Yao, Ting
    Mao, Jiaxin
    Ma, Shaoping
    Mei, Tao
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 941 - 954
  • [28] Development and Classification of Image Dataset for Text-to-Image Generation
    Kumar M.
    Mittal M.
    Singh S.
    Journal of The Institution of Engineers (India): Series B, 2024, 105 (04) : 787 - 796
  • [29] Text-to-Image Synthesis via Visual-Memory Creative Adversarial Network
    Zhang, Shengyu
    Dong, Hao
    Hu, Wei
    Guo, Yike
    Wu, Chao
    Xie, Di
    Wu, Fei
    ADVANCES IN MULTIMEDIA INFORMATION PROCESSING, PT III, 2018, 11166 : 417 - 427
  • [30] Zero-Shot Text-to-Image Generation
    Ramesh, Aditya
    Pavlov, Mikhail
    Goh, Gabriel
    Gray, Scott
    Voss, Chelsea
    Radford, Alec
    Chen, Mark
    Sutskever, Ilya
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139