Attribute-Centric Compositional Text-to-Image Generation

被引:0
|
作者
Cong, Yuren [1 ]
Min, Martin Renqiang [2 ]
Li, Li Erran [3 ]
Rosenhahn, Bodo [1 ]
Yang, Michael Ying [4 ]
机构
[1] Leibniz Univ Hannover, Inst Informat Proc, Hannover, Germany
[2] NEC Labs Amer, Princeton, NJ USA
[3] Amazon, AWS AI, San Francisco, CA USA
[4] Univ Bath, Visual Comp Grp, Bath, England
关键词
Text-to-image; Compositional generation; Attribute-centric;
D O I
10.1007/s11263-025-02371-0
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Despite the recent impressive breakthroughs in text-to-image generation, generative models have difficulty in capturing the data distribution of underrepresented attribute compositions while over-memorizing overrepresented attribute compositions, which raises public concerns about their robustness and fairness. To tackle this challenge, we propose ACTIG, an attribute-centric compositional text-to-image generation framework. We present an attribute-centric feature augmentation and a novel image-free training scheme, which greatly improves model's ability to generate images with underrepresented attributes. We further propose an attribute-centric contrastive loss to avoid overfitting to overrepresented attribute compositions. We validate our framework on the CelebA-HQ and CUB datasets. Extensive experiments show that the compositional generalization of ACTIG is outstanding, and our framework outperforms previous works in terms of image quality and text-image consistency. The source code and trained models are publicly available at https://github.com/yrcong/ACTIG.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] Attribute-Centric Referring Expression Generation
    Dale, Robert
    Viethen, Jette
    EMPIRICAL METHODS IN NATURAL LANGUAGE GENERATION: DATA-ORIENTED METHODS AND EMPIRICAL EVALUATION, 2010, 5790 : 163 - 179
  • [2] Controllable Text-to-Image Generation
    Li, Bowen
    Qi, Xiaojuan
    Lukasiewicz, Thomas
    Torr, Philip H. S.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [3] Surgical text-to-image generation
    Nwoye, Chinedu Innocent
    Bose, Rupak
    Elgohary, Kareem
    Arboit, Lorenzo
    Carlino, Giorgio
    Lavanchy, Joel L.
    Mascagni, Pietro
    Padoy, Nicolas
    PATTERN RECOGNITION LETTERS, 2025, 190 : 73 - 80
  • [4] Cola: A Benchmark for Compositional Text-to-image Retrieval
    Ray, Arijit
    Radenovic, Filip
    Dubey, Abhimanyu
    Plummer, Bryan A.
    Krishna, Ranjay
    Saenko, Kate
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [5] Expressive Text-to-Image Generation with Rich Text
    Ge, Songwei
    Park, Taesung
    Zhu, Jun-Yan
    Huang, Jia-Bin
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 7511 - 7522
  • [6] SEMANTICALLY INVARIANT TEXT-TO-IMAGE GENERATION
    Sah, Shagan
    Peri, Dheeraj
    Shringi, Ameya
    Zhang, Chi
    Dominguez, Miguel
    Savakis, Andreas
    Ptucha, Ray
    2018 25TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2018, : 3783 - 3787
  • [7] AMM-GAN: Attribute-Matching Memory for Person Text-to-Image Generation
    Yue, Wei
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT I, 2024, 14425 : 146 - 158
  • [8] Semantics Disentangling for Text-to-Image Generation
    Yin, Guojun
    Liu, Bin
    Sheng, Lu
    Yu, Nenghai
    Wang, Xiaogang
    Shao, Jing
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 2322 - 2331
  • [9] Text-to-Image Generation for Abstract Concepts
    Liao, Jiayi
    Chen, Xu
    Fu, Qiang
    Du, Lun
    He, Xiangnan
    Wang, Xiang
    Han, Shi
    Zhang, Dongmei
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 4, 2024, : 3360 - 3368
  • [10] Shifted Diffusion for Text-to-image Generation
    Zhou, Yufan
    Liu, Bingchen
    Zhu, Yizhe
    Yang, Xiao
    Chen, Changyou
    Xu, Jinhui
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 10157 - 10166