Attribute-Centric Compositional Text-to-Image Generation

被引:0
|
作者
Cong, Yuren [1 ]
Min, Martin Renqiang [2 ]
Li, Li Erran [3 ]
Rosenhahn, Bodo [1 ]
Yang, Michael Ying [4 ]
机构
[1] Leibniz Univ Hannover, Inst Informat Proc, Hannover, Germany
[2] NEC Labs Amer, Princeton, NJ USA
[3] Amazon, AWS AI, San Francisco, CA USA
[4] Univ Bath, Visual Comp Grp, Bath, England
关键词
Text-to-image; Compositional generation; Attribute-centric;
D O I
10.1007/s11263-025-02371-0
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Despite the recent impressive breakthroughs in text-to-image generation, generative models have difficulty in capturing the data distribution of underrepresented attribute compositions while over-memorizing overrepresented attribute compositions, which raises public concerns about their robustness and fairness. To tackle this challenge, we propose ACTIG, an attribute-centric compositional text-to-image generation framework. We present an attribute-centric feature augmentation and a novel image-free training scheme, which greatly improves model's ability to generate images with underrepresented attributes. We further propose an attribute-centric contrastive loss to avoid overfitting to overrepresented attribute compositions. We validate our framework on the CelebA-HQ and CUB datasets. Extensive experiments show that the compositional generalization of ACTIG is outstanding, and our framework outperforms previous works in terms of image quality and text-image consistency. The source code and trained models are publicly available at https://github.com/yrcong/ACTIG.
引用
收藏
页数:16
相关论文
共 50 条
  • [41] MobileDiffusion: Instant Text-to-Image Generation on Mobile Devices
    Zhao, Yang
    Xu, Yanwu
    Xiao, Zhisheng
    Jia, Haolin
    Hou, Tingbo
    COMPUTER VISION - ECCV 2024, PT LXII, 2025, 15120 : 225 - 242
  • [42] Social Biases through the Text-to-Image Generation Lens
    Naik, Ranjita
    Nushi, Besmira
    PROCEEDINGS OF THE 2023 AAAI/ACM CONFERENCE ON AI, ETHICS, AND SOCIETY, AIES 2023, 2023, : 786 - 808
  • [43] HARIVO: Harnessing Text-to-Image Models for Video Generation
    Kwon, Mingi
    Oh, Seoung Wug
    Zhou, Yang
    Liu, Difan
    Lee, Joon-Young
    Cai, Haoran
    Liu, Baqiao
    Liu, Feng
    Uh, Youngjung
    COMPUTER VISION - ECCV 2024, PT LIII, 2025, 15111 : 19 - 36
  • [44] ITI- GEN: Inclusive Text-to-Image Generation
    Zhang, Cheng
    Chen, Xuanbai
    Chai, Siqi
    Wu, Chen Henry
    Lagun, Dmitry
    Beeler, Thabo
    De la Torre, Fernando
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 3946 - 3957
  • [45] Translation-Enhanced Multilingual Text-to-Image Generation
    Li, Yaoyiran
    Chang, Ching-Yun
    Rawls, Stephen
    Vulic, Ivan
    Korhonen, Anna
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 9174 - 9193
  • [46] Training-Free Consistent Text-to-Image Generation
    Tewel, Yoad
    Kaduri, Omri
    Gal, Rinon
    Kasten, Yoni
    Wolf, Lior
    Chechik, Gal
    Atzmon, Yuval
    ACM TRANSACTIONS ON GRAPHICS, 2024, 43 (04):
  • [47] Text-to-image generation combined with mutual information maximization
    Mo J.
    Xu K.
    Lin L.
    Ouyang N.
    Xi'an Dianzi Keji Daxue Xuebao/Journal of Xidian University, 2019, 46 (05): : 180 - 188
  • [48] EmoGen: Emotional Image Content Generation with Text-to-Image Diffusion Models
    Yang, Jingyuan
    Feng, Jiawei
    Huang, Hui
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 6358 - 6368
  • [49] T2I-CompBench: A Comprehensive Benchmark for Open-world Compositional Text-to-image Generation
    Huang, Kaiyi
    Sun, Kaiyue
    Xie, Enze
    Li, Zhenguo
    Liu, Xihui
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [50] Background Layout Generation and Object Knowledge Transfer for Text-to-Image Generation
    Chen, Zhuowei
    Mao, Zhendong
    Fang, Shancheng
    Hu, Bo
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 4327 - 4335