Attribute-Centric Compositional Text-to-Image Generation

被引:0
|
作者
Cong, Yuren [1 ]
Min, Martin Renqiang [2 ]
Li, Li Erran [3 ]
Rosenhahn, Bodo [1 ]
Yang, Michael Ying [4 ]
机构
[1] Leibniz Univ Hannover, Inst Informat Proc, Hannover, Germany
[2] NEC Labs Amer, Princeton, NJ USA
[3] Amazon, AWS AI, San Francisco, CA USA
[4] Univ Bath, Visual Comp Grp, Bath, England
关键词
Text-to-image; Compositional generation; Attribute-centric;
D O I
10.1007/s11263-025-02371-0
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Despite the recent impressive breakthroughs in text-to-image generation, generative models have difficulty in capturing the data distribution of underrepresented attribute compositions while over-memorizing overrepresented attribute compositions, which raises public concerns about their robustness and fairness. To tackle this challenge, we propose ACTIG, an attribute-centric compositional text-to-image generation framework. We present an attribute-centric feature augmentation and a novel image-free training scheme, which greatly improves model's ability to generate images with underrepresented attributes. We further propose an attribute-centric contrastive loss to avoid overfitting to overrepresented attribute compositions. We validate our framework on the CelebA-HQ and CUB datasets. Extensive experiments show that the compositional generalization of ACTIG is outstanding, and our framework outperforms previous works in terms of image quality and text-image consistency. The source code and trained models are publicly available at https://github.com/yrcong/ACTIG.
引用
收藏
页数:16
相关论文
共 50 条
  • [31] Improving text-to-image generation with object layout guidance
    Jezia Zakraoui
    Moutaz Saleh
    Somaya Al-Maadeed
    Jihad Mohammed Jaam
    Multimedia Tools and Applications, 2021, 80 : 27423 - 27443
  • [32] Compositional Text-to-Image Synthesis with Attention Map Control of Diffusion Models
    Wang, Ruichen
    Chen, Zekang
    Chen, Chen
    Ma, Jian
    Lu, Haonan
    Lin, Xiaodong
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 6, 2024, : 5544 - 5552
  • [33] Variational Distribution Learning for Unsupervised Text-to-Image Generation
    Kang, Minsoo
    Lee, Doyup
    Kim, Jiseob
    Kim, Saehoon
    Han, Bohyung
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 23380 - 23389
  • [34] HanDiffuser: Text-to-Image Generation With Realistic Hand Appearances
    Narasimhaswamy, Supreeth
    Bhattacharya, Uttaran
    Chen, Xiang
    Dasgupta, Ishita
    Mitra, Saayan
    Hoai, Minh
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 2468 - 2479
  • [35] Using text-to-image generation for architectural design ideation
    Paananen, Ville
    Oppenlaender, Jonas
    Visuri, Aku
    INTERNATIONAL JOURNAL OF ARCHITECTURAL COMPUTING, 2024, 22 (03) : 458 - 474
  • [36] No-reference Quality Assessment of Text-to-Image Generation
    Huang, Haitao
    Jia, Rongli
    Zhang, Yuhong
    Xie, Rong
    Song, Li
    Li, Lin
    Feng, Yanan
    19TH IEEE INTERNATIONAL SYMPOSIUM ON BROADBAND MULTIMEDIA SYSTEMS AND BROADCASTING, BMSB 2024, 2024, : 357 - 362
  • [37] CogView: Mastering Text-to-Image Generation via Transformers
    Ding, Ming
    Yang, Zhuoyi
    Hong, Wenyi
    Zheng, Wendi
    Zhou, Chang
    Yin, Da
    Lin, Junyang
    Zou, Xu
    Shao, Zhou
    Yang, Hongxia
    Tang, Jie
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [38] Latent Guard: A Safety Framework for Text-to-Image Generation
    Liu, Runtao
    Khakzar, Ashkan
    Gu, Jindong
    Chen, Qifeng
    Torr, Philip
    Pizzati, Fabio
    COMPUTER VISION - ECCV 2024, PT XXVI, 2025, 15084 : 93 - 109
  • [39] Improving text-to-image generation with object layout guidance
    Zakraoui, Jezia
    Saleh, Moutaz
    Al-Maadeed, Somaya
    Jaam, Jihad Mohammed
    MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (18) : 27423 - 27443
  • [40] ReCo: Region-Controlled Text-to-Image Generation
    Yang, Zhengyuan
    Wang, Jianfeng
    Gan, Zhe
    Li, Linjie
    Lin, Kevin
    Wu, Chenfei
    Duan, Nan
    Liu, Zicheng
    Liu, Ce
    Zeng, Michael
    Wang, Lijuan
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 14246 - 14255