Norm-guided latent space exploration for text-to-image generation

被引：0

作者：

Samuel, Dvir ^{[1
,2
]}

Ben-Ari, Rami ^{[2
]}

Darshan, Nir ^{[2
]}

Maron, Haggai ^{[3
,4
]}

Chechik, Gal ^{[1
,4
]}

机构：

[1] Bar Ilan Univ, Ramat Gan, Israel

[2] OriginAI, Tel Aviv, Israel

[3] Technion, Haifa, Israel

[4] NVIDIA Res, Tel Aviv, Israel

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023) | 2023年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Text-to-image diffusion models show great potential in synthesizing a large variety of concepts in new compositions and scenarios. However, the latent space of initial seeds is still not well understood and its structure was shown to impact the generation of various concepts. Specifically, simple operations like interpolation and finding the centroid of a set of seeds perform poorly when using standard Euclidean or spherical metrics in the latent space. This paper makes the observation that, in current training procedures, diffusion models observed inputs with a narrow range of norm values. This has strong implications for methods that rely on seed manipulation for image generation, with applications to few-shot and long-tail learning tasks. To address this issue, we propose a novel method for interpolating between two seeds and demonstrate that it defines a new non-Euclidean metric that takes into account a norm-based prior on seeds. We describe a simple yet efficient algorithm for approximating this interpolation procedure and use it to further define centroids in the latent seed space. We show that our new interpolation and centroid techniques significantly enhance the generation of rare concept images. This further leads to state-of-the-art performance on few-shot and long-tail benchmarks, improving prior approaches in terms of generation speed, image quality, and semantic content.

引用

页数：13

共 50 条

[41] ReCo: Region-Controlled Text-to-Image Generation
Yang, Zhengyuan
Wang, Jianfeng
Gan, Zhe
Li, Linjie
Lin, Kevin
Wu, Chenfei
Duan, Nan
Liu, Zicheng
Liu, Ce
Zeng, Michael
Wang, Lijuan
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 14246 - 14255
[42] MobileDiffusion: Instant Text-to-Image Generation on Mobile Devices
Zhao, Yang
Xu, Yanwu
Xiao, Zhisheng
Jia, Haolin
Hou, Tingbo
COMPUTER VISION - ECCV 2024, PT LXII, 2025, 15120 : 225 - 242
[43] Text-to-image generation combined with mutual information maximization
Mo J.
Xu K.
Lin L.
Ouyang N.
Xi'an Dianzi Keji Daxue Xuebao/Journal of Xidian University, 2019, 46 (05): : 180 - 188
[44] Training-Free Consistent Text-to-Image Generation
Tewel, Yoad
Kaduri, Omri
Gal, Rinon
Kasten, Yoni
Wolf, Lior
Chechik, Gal
Atzmon, Yuval
ACM TRANSACTIONS ON GRAPHICS, 2024, 43 (04):
[45] ITI- GEN: Inclusive Text-to-Image Generation
Zhang, Cheng
Chen, Xuanbai
Chai, Siqi
Wu, Chen Henry
Lagun, Dmitry
Beeler, Thabo
De la Torre, Fernando
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 3946 - 3957
[46] Translation-Enhanced Multilingual Text-to-Image Generation
Li, Yaoyiran
Chang, Ching-Yun
Rawls, Stephen
Vulic, Ivan
Korhonen, Anna
PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 9174 - 9193
[47] EmoGen: Emotional Image Content Generation with Text-to-Image Diffusion Models
Yang, Jingyuan
Feng, Jiawei
Huang, Hui
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 6358 - 6368
[48] Lost in Translation: Latent Concept Misalignment in Text-to-Image Diffusion Models
Zhao, Juntu
Deng, Junyu
Ye, Yixin
Li, Chongxuan
Deng, Zhijie
Wang, Dequan
COMPUTER VISION - ECCV 2024, PT LXIX, 2025, 15127 : 318 - 333
[49] Background Layout Generation and Object Knowledge Transfer for Text-to-Image Generation
Chen, Zhuowei
Mao, Zhendong
Fang, Shancheng
Hu, Bo
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 4327 - 4335
[50] GreenStableYolo: Optimizing Inference Time and Image Quality of Text-to-Image Generation
Gong, Jingzhi
Li, Sisi
D'Aloisio, Giordano
Ding, Zishuo
Ye, Yulong
Langdon, William B.
Sarro, Federica
SEARCH-BASED SOFTWARE ENGINEERING, SSBSE 2024, 2024, 14767 : 70 - 76

← 1 2 3 4 5 →