Norm-guided latent space exploration for text-to-image generation

被引:0
|
作者
Samuel, Dvir [1 ,2 ]
Ben-Ari, Rami [2 ]
Darshan, Nir [2 ]
Maron, Haggai [3 ,4 ]
Chechik, Gal [1 ,4 ]
机构
[1] Bar Ilan Univ, Ramat Gan, Israel
[2] OriginAI, Tel Aviv, Israel
[3] Technion, Haifa, Israel
[4] NVIDIA Res, Tel Aviv, Israel
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text-to-image diffusion models show great potential in synthesizing a large variety of concepts in new compositions and scenarios. However, the latent space of initial seeds is still not well understood and its structure was shown to impact the generation of various concepts. Specifically, simple operations like interpolation and finding the centroid of a set of seeds perform poorly when using standard Euclidean or spherical metrics in the latent space. This paper makes the observation that, in current training procedures, diffusion models observed inputs with a narrow range of norm values. This has strong implications for methods that rely on seed manipulation for image generation, with applications to few-shot and long-tail learning tasks. To address this issue, we propose a novel method for interpolating between two seeds and demonstrate that it defines a new non-Euclidean metric that takes into account a norm-based prior on seeds. We describe a simple yet efficient algorithm for approximating this interpolation procedure and use it to further define centroids in the latent seed space. We show that our new interpolation and centroid techniques significantly enhance the generation of rare concept images. This further leads to state-of-the-art performance on few-shot and long-tail benchmarks, improving prior approaches in terms of generation speed, image quality, and semantic content.
引用
收藏
页数:13
相关论文
共 50 条
  • [41] ReCo: Region-Controlled Text-to-Image Generation
    Yang, Zhengyuan
    Wang, Jianfeng
    Gan, Zhe
    Li, Linjie
    Lin, Kevin
    Wu, Chenfei
    Duan, Nan
    Liu, Zicheng
    Liu, Ce
    Zeng, Michael
    Wang, Lijuan
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 14246 - 14255
  • [42] MobileDiffusion: Instant Text-to-Image Generation on Mobile Devices
    Zhao, Yang
    Xu, Yanwu
    Xiao, Zhisheng
    Jia, Haolin
    Hou, Tingbo
    COMPUTER VISION - ECCV 2024, PT LXII, 2025, 15120 : 225 - 242
  • [43] Text-to-image generation combined with mutual information maximization
    Mo J.
    Xu K.
    Lin L.
    Ouyang N.
    Xi'an Dianzi Keji Daxue Xuebao/Journal of Xidian University, 2019, 46 (05): : 180 - 188
  • [44] Training-Free Consistent Text-to-Image Generation
    Tewel, Yoad
    Kaduri, Omri
    Gal, Rinon
    Kasten, Yoni
    Wolf, Lior
    Chechik, Gal
    Atzmon, Yuval
    ACM TRANSACTIONS ON GRAPHICS, 2024, 43 (04):
  • [45] ITI- GEN: Inclusive Text-to-Image Generation
    Zhang, Cheng
    Chen, Xuanbai
    Chai, Siqi
    Wu, Chen Henry
    Lagun, Dmitry
    Beeler, Thabo
    De la Torre, Fernando
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 3946 - 3957
  • [46] Translation-Enhanced Multilingual Text-to-Image Generation
    Li, Yaoyiran
    Chang, Ching-Yun
    Rawls, Stephen
    Vulic, Ivan
    Korhonen, Anna
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 9174 - 9193
  • [47] EmoGen: Emotional Image Content Generation with Text-to-Image Diffusion Models
    Yang, Jingyuan
    Feng, Jiawei
    Huang, Hui
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 6358 - 6368
  • [48] Lost in Translation: Latent Concept Misalignment in Text-to-Image Diffusion Models
    Zhao, Juntu
    Deng, Junyu
    Ye, Yixin
    Li, Chongxuan
    Deng, Zhijie
    Wang, Dequan
    COMPUTER VISION - ECCV 2024, PT LXIX, 2025, 15127 : 318 - 333
  • [49] Background Layout Generation and Object Knowledge Transfer for Text-to-Image Generation
    Chen, Zhuowei
    Mao, Zhendong
    Fang, Shancheng
    Hu, Bo
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 4327 - 4335
  • [50] GreenStableYolo: Optimizing Inference Time and Image Quality of Text-to-Image Generation
    Gong, Jingzhi
    Li, Sisi
    D'Aloisio, Giordano
    Ding, Zishuo
    Ye, Yulong
    Langdon, William B.
    Sarro, Federica
    SEARCH-BASED SOFTWARE ENGINEERING, SSBSE 2024, 2024, 14767 : 70 - 76