Locally controllable network based on visual–linguistic relation alignment for text-to-image generation

被引:0
|
作者
Zaike Li
Li Liu
Huaxiang Zhang
Dongmei Liu
Yu Song
Boqun Li
机构
[1] Shandong Normal University,School of Information Science and Engineering
来源
Multimedia Systems | 2024年 / 30卷
关键词
Text-to-image generation; Image-text matching; Generative adversarial network; Local control;
D O I
暂无
中图分类号
学科分类号
摘要
Since locally controllable text-to-image generation cannot achieve satisfactory results in detail, a novel locally controllable text-to-image generation network based on visual–linguistic relation alignment is proposed. The goal of the method is to complete image processing and generation semantically through text guidance. The proposed method explores the relationship between text and image to achieve local control of text-to-image generation. The visual–linguistic matching learns the similarity weights between image and text through semantic features to achieve the fine-grained correspondence between local images and words. The instance-level optimization function is introduced into the generation process to accurately control the weight with low similarity and combine with text features to generate new visual attributes. In addition, a local control loss is proposed to preserve the details of the text and local regions of the image. Extensive experiments demonstrate the superior performance of the proposed method and enable more accurate control of the original image.
引用
收藏
相关论文
共 50 条
  • [1] Locally controllable network based on visual-linguistic relation alignment for text-to-image generation
    Li, Zaike
    Liu, Li
    Zhang, Huaxiang
    Liu, Dongmei
    Song, Yu
    Li, Boqun
    MULTIMEDIA SYSTEMS, 2024, 30 (01)
  • [2] Controllable Text-to-Image Generation
    Li, Bowen
    Qi, Xiaojuan
    Lukasiewicz, Thomas
    Torr, Philip H. S.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [3] Visual Programming for Text-to-Image Generation and Evaluation
    Cho, Jaemin
    Zala, Abhay
    Bansal, Mohit
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [4] Visual question answering based evaluation metrics for text-to-image generation
    Miyamoto, Mizuki
    Morita, Ryugo
    Zhou, Jinjia
    2024 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS 2024, 2024,
  • [5] Generative adversarial network based on semantic consistency for text-to-image generation
    Yue Ma
    Li Liu
    Huaxiang Zhang
    Chunjing Wang
    Zekang Wang
    Applied Intelligence, 2023, 53 : 4703 - 4716
  • [6] Generative adversarial network based on semantic consistency for text-to-image generation
    Ma, Yue
    Liu, Li
    Zhang, Huaxiang
    Wang, Chunjing
    Wang, Zekang
    APPLIED INTELLIGENCE, 2023, 53 (04) : 4703 - 4716
  • [7] Surgical text-to-image generation
    Nwoye, Chinedu Innocent
    Bose, Rupak
    Elgohary, Kareem
    Arboit, Lorenzo
    Carlino, Giorgio
    Lavanchy, Joel L.
    Mascagni, Pietro
    Padoy, Nicolas
    PATTERN RECOGNITION LETTERS, 2025, 190 : 73 - 80
  • [8] Text-to-image generation method based on single stage generative adversarial network
    Yang B.
    Na W.
    Xiang X.-Q.
    Zhejiang Daxue Xuebao (Gongxue Ban)/Journal of Zhejiang University (Engineering Science), 2023, 57 (12): : 2412 - 2420
  • [9] Text-to-Image Generation Method Based on Image-Text Semantic Consistency
    Xue Z.
    Xu Z.
    Lang C.
    Feng S.
    Wang T.
    Li Y.
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2023, 60 (09): : 2180 - 2190
  • [10] PBGN: Phased Bidirectional Generation Network in Text-to-Image Synthesis
    Jianwei Zhu
    Zhixin Li
    Jiahui Wei
    Huifang Ma
    Neural Processing Letters, 2022, 54 : 5371 - 5391