TRAINING-FREE LOCATION-AWARE TEXT-TO-IMAGE SYNTHESIS

被引:2
|
作者
Mao, Jiafeng [1 ]
Wang, Xueting [2 ]
机构
[1] Univ Tokyo, Dept Informat & Commun Engn, Tokyo, Japan
[2] CyberAgent Inc, AI Lab, Tokyo, Japan
关键词
diffusion model; text-to-image synthesis;
D O I
10.1109/ICIP49359.2023.10222616
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Current large-scale generative models have impressive efficiency in generating high-quality images based on text prompts. However, they lack the ability to precisely control the size and position of objects in the generated image. In this study(1), we analyze the generative mechanism of the stable diffusion model and propose a new interactive generation paradigm that allows users to specify the position of generated objects without additional training. Moreover, we propose an object detection-based evaluation metric to assess the control capability of location aware generation task. Our experimental results show that our method outperforms state-of-the-art methods on both control capacity and image quality.
引用
收藏
页码:995 / 999
页数:5
相关论文
共 50 条
  • [1] Training-Free Consistent Text-to-Image Generation
    Tewel, Yoad
    Kaduri, Omri
    Gal, Rinon
    Kasten, Yoni
    Wolf, Lior
    Chechik, Gal
    Atzmon, Yuval
    ACM TRANSACTIONS ON GRAPHICS, 2024, 43 (04):
  • [2] BoxDiff: Text-to-Image Synthesis with Training-Free Box-Constrained Diffusion
    Xie, Jinheng
    Li, Yuexiang
    Huang, Yawen
    Liu, Haozhe
    Zhang, Wentian
    Zheng, Yefeng
    Shou, Mike Zheng
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 7417 - 7427
  • [3] Zero-Painter: Training-Free Layout Control for Text-to-Image Synthesis
    Ohanyan, Marianna
    Manukyan, Hayk
    Wang, Zhangyang
    Navasardyan, Shant
    Shi, Humphrey
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 8764 - 8774
  • [4] Training-free Diffusion Model Adaptation for Variable-Sized Text-to-Image Synthesis
    Jin, Zhiyu
    Shen, Xuli
    Li, Bin
    Xue, Xiangyang
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [5] RECON: Training-Free Acceleration for Text-to-Image Synthesis with Retrieval of Concept Prompt Trajectories
    Lu, Chen-Yi
    Agarwal, Shubham
    Tanjim, Md Mehrab
    Mahadik, Kanak
    Rao, Anup
    Mitra, Subrata
    Saini, Shiv Kumar
    Bagchi, Saurabh
    Chaterji, Somali
    COMPUTER VISION - ECCV 2024, PT LIX, 2025, 15117 : 288 - 306
  • [6] Check, Locate, Rectify: A Training-Free Layout Calibration System for Text-to-Image Generation
    Gong, Biao
    Huang, Siteng
    Feng, Yutong
    Zhang, Shiwei
    Li, Yuyuan
    Liu, Yu
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 6624 - 6634
  • [7] FreeControl: Training-Free Spatial Control of Any Text-to-Image Diffusion Model with Any Condition
    Mo, Sicheng
    Mu, Fangzhou
    Lin, Kuan Heng
    Liu, Yanli
    Guan, Bochen
    Li, Yin
    Zhou, Bolei
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 7465 - 7475
  • [8] Semantic-Aware Data Augmentation for Text-to-Image Synthesis
    Tan, Zhaorui
    Yang, Xi
    Huang, Kaizhu
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 6, 2024, : 5098 - 5107
  • [9] Towards Language-Free Training for Text-to-Image Generation
    Zhou, Yufan
    Zhang, Ruiyi
    Chen, Changyou
    Li, Chunyuan
    Tensmeyer, Chris
    Yu, Tong
    Gu, Jiuxiang
    Xu, Jinhui
    Sun, Tong
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 17886 - 17896
  • [10] Survey of text-to-image synthesis
    Cao Y.
    Qin J.
    Ma Q.
    Sun H.
    Yan K.
    Wang L.
    Ren J.
    Zhejiang Daxue Xuebao (Gongxue Ban)/Journal of Zhejiang University (Engineering Science), 2024, 58 (02): : 219 - 238