Text to image synthesis with multi-granularity feature aware enhancement Generative Adversarial Networks

被引:0
|
作者
Dong, Pei [1 ]
Wu, Lei [1 ]
Li, Ruichen [1 ]
Meng, Xiangxu [1 ]
Meng, Lei [1 ]
机构
[1] Shandong Univ, Sch Software, 1500 ShunHua Rd High Tech Ind Dev Zone, Jinan 250101, Peoples R China
关键词
Generative adversarial network; Multi-granularity feature aware enhancement; Text-to-image; Autoregressive; Diffusion;
D O I
10.1016/j.cviu.2024.104042
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Synthesizing complex images from text presents challenging. Compared to autoregressive and diffusion modelbased methods, Generative Adversarial Network -based methods have significant advantages in terms of computational cost and generation efficiency yet remain two limitations: first, these methods often refine all features output from the previous stage indiscriminately, without considering these features are initialized gradually during the generation process; second, the sparse semantic constraints provided by the text description are typically ineffective for refining fine-grained features. These issues complicate the balance between generation quality, computational cost and inference speed. To address these issues, we propose a Multi -granularity Feature Aware Enhancement GAN (MFAE-GAN), which allows the refinement process to match the order of different granularity features being initialized. Specifically, MFAE-GAN (1) samples category -related coarse -grained features and instance -level detail -related fine-grained features at different generation stages based on different attention mechanisms in Coarse -grained Feature Enhancement (CFE) and Fine-grained Feature Enhancement (FFE) to guide the generation process spatially, (2) provides denser semantic constraints than textual semantic information through Multi -granularity Features Adaptive Batch Normalization (MFA-BN) in the process of refining fine-grained features, and (3) adopts a Global Semantics Preservation (GSP) to avoid the loss of global semantics when sampling features continuously. Extensive experimental results demonstrate that our MFAE-GAN is competitive in terms of both image generation quality and efficiency.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] SAW-GAN: Multi-granularity Text Fusion Generative Adversarial Networks for text-to-image generation
    Jin, Dehu
    Yu, Qi
    Yu, Lan
    Qi, Meng
    KNOWLEDGE-BASED SYSTEMS, 2024, 294
  • [2] Multi-granularity generative adversarial nets with reconstructive sampling for image inpainting
    Xu, Liming
    Zeng, Xianhua
    Li, Weisheng
    Huang, Zhiwei
    NEUROCOMPUTING, 2020, 402 : 220 - 234
  • [3] TEXT TO IMAGE SYNTHESIS WITH ERUDITE GENERATIVE ADVERSARIAL NETWORKS
    Zhang, Zhiqiang
    Yu, Wenxin
    Jiang, Ning
    Zhou, Jinjia
    2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 2438 - 2442
  • [4] Text to image synthesis using multi-generator text conditioned generative adversarial networks
    Min Zhang
    Chunye Li
    Zhiping Zhou
    Multimedia Tools and Applications, 2021, 80 : 7789 - 7803
  • [5] Text to image synthesis using multi-generator text conditioned generative adversarial networks
    Zhang, Min
    Li, Chunye
    Zhou, Zhiping
    MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (05) : 7789 - 7803
  • [6] Historical Text Image Enhancement Using Image Scaling and Generative Adversarial Networks
    Khan, Sajid Ullah
    Ullah, Imdad
    Khan, Faheem
    Lee, Youngmoon
    Ullah, Shahid
    SENSORS, 2023, 23 (08)
  • [7] Multi-granularity Feature Attention Fusion Network for Image-Text Sentiment Analysis
    Sun, Tao
    Wang, Shuang
    Zhong, Shenjie
    ADVANCES IN COMPUTER GRAPHICS, CGI 2022, 2022, 13443 : 3 - 14
  • [8] DRAWGAN: TEXT TO IMAGE SYNTHESIS WITH DRAWING GENERATIVE ADVERSARIAL NETWORKS
    Zhang, Zhiqiang
    Zhou, Jinjia
    Yu, Wenxin
    Jiang, Ning
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 4195 - 4199
  • [9] Generative Adversarial Text to Image Synthesis
    Reed, Scott
    Akata, Zeynep
    Yan, Xinchen
    Logeswaran, Lajanugen
    Schiele, Bernt
    Lee, Honglak
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
  • [10] Multi-Frequency Feature Enhancement for Multi-Granularity Visual Classification
    Fu, Meijiang
    Zheng, Yixiao
    Chang, Dongliang
    Li, Wenpan
    Ma, Zhanyu
    2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 484 - 489