Attribute-Centric Compositional Text-to-Image Generation

被引：0

作者：

Cong, Yuren ^{[1
]}

Min, Martin Renqiang ^{[2
]}

Li, Li Erran ^{[3
]}

Rosenhahn, Bodo ^{[1
]}

Yang, Michael Ying ^{[4
]}

机构：

[1] Leibniz Univ Hannover, Inst Informat Proc, Hannover, Germany

[2] NEC Labs Amer, Princeton, NJ USA

[3] Amazon, AWS AI, San Francisco, CA USA

[4] Univ Bath, Visual Comp Grp, Bath, England

来源：

INTERNATIONAL JOURNAL OF COMPUTER VISION | 2025年

关键词：

Text-to-image; Compositional generation; Attribute-centric;

D O I：

10.1007/s11263-025-02371-0

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Despite the recent impressive breakthroughs in text-to-image generation, generative models have difficulty in capturing the data distribution of underrepresented attribute compositions while over-memorizing overrepresented attribute compositions, which raises public concerns about their robustness and fairness. To tackle this challenge, we propose ACTIG, an attribute-centric compositional text-to-image generation framework. We present an attribute-centric feature augmentation and a novel image-free training scheme, which greatly improves model's ability to generate images with underrepresented attributes. We further propose an attribute-centric contrastive loss to avoid overfitting to overrepresented attribute compositions. We validate our framework on the CelebA-HQ and CUB datasets. Extensive experiments show that the compositional generalization of ACTIG is outstanding, and our framework outperforms previous works in terms of image quality and text-image consistency. The source code and trained models are publicly available at https://github.com/yrcong/ACTIG.

引用

页数：16

共 50 条

[1] Attribute-Centric Referring Expression Generation
Dale, Robert
Viethen, Jette
EMPIRICAL METHODS IN NATURAL LANGUAGE GENERATION: DATA-ORIENTED METHODS AND EMPIRICAL EVALUATION, 2010, 5790 : 163 - 179
[2] Controllable Text-to-Image Generation
Li, Bowen
Qi, Xiaojuan
Lukasiewicz, Thomas
Torr, Philip H. S.
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[3] Surgical text-to-image generation
Nwoye, Chinedu Innocent
Bose, Rupak
Elgohary, Kareem
Arboit, Lorenzo
Carlino, Giorgio
Lavanchy, Joel L.
Mascagni, Pietro
Padoy, Nicolas
PATTERN RECOGNITION LETTERS, 2025, 190 : 73 - 80
[4] Cola: A Benchmark for Compositional Text-to-image Retrieval
Ray, Arijit
Radenovic, Filip
Dubey, Abhimanyu
Plummer, Bryan A.
Krishna, Ranjay
Saenko, Kate
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[5] Expressive Text-to-Image Generation with Rich Text
Ge, Songwei
Park, Taesung
Zhu, Jun-Yan
Huang, Jia-Bin
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 7511 - 7522
[6] SEMANTICALLY INVARIANT TEXT-TO-IMAGE GENERATION
Sah, Shagan
Peri, Dheeraj
Shringi, Ameya
Zhang, Chi
Dominguez, Miguel
Savakis, Andreas
Ptucha, Ray
2018 25TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2018, : 3783 - 3787
[7] AMM-GAN: Attribute-Matching Memory for Person Text-to-Image Generation
Yue, Wei
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT I, 2024, 14425 : 146 - 158
[8] Semantics Disentangling for Text-to-Image Generation
Yin, Guojun
Liu, Bin
Sheng, Lu
Yu, Nenghai
Wang, Xiaogang
Shao, Jing
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 2322 - 2331
[9] Text-to-Image Generation for Abstract Concepts
Liao, Jiayi
Chen, Xu
Fu, Qiang
Du, Lun
He, Xiangnan
Wang, Xiang
Han, Shi
Zhang, Dongmei
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 4, 2024, : 3360 - 3368
[10] Shifted Diffusion for Text-to-image Generation
Zhou, Yufan
Liu, Bingchen
Zhu, Yizhe
Yang, Xiao
Chen, Changyou
Xu, Jinhui
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 10157 - 10166

← 1 2 3 4 5 →