TRAINING-FREE LOCATION-AWARE TEXT-TO-IMAGE SYNTHESIS

被引:2
|
作者
Mao, Jiafeng [1 ]
Wang, Xueting [2 ]
机构
[1] Univ Tokyo, Dept Informat & Commun Engn, Tokyo, Japan
[2] CyberAgent Inc, AI Lab, Tokyo, Japan
关键词
diffusion model; text-to-image synthesis;
D O I
10.1109/ICIP49359.2023.10222616
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Current large-scale generative models have impressive efficiency in generating high-quality images based on text prompts. However, they lack the ability to precisely control the size and position of objects in the generated image. In this study(1), we analyze the generative mechanism of the stable diffusion model and propose a new interactive generation paradigm that allows users to specify the position of generated objects without additional training. Moreover, we propose an object detection-based evaluation metric to assess the control capability of location aware generation task. Our experimental results show that our method outperforms state-of-the-art methods on both control capacity and image quality.
引用
收藏
页码:995 / 999
页数:5
相关论文
共 50 条
  • [31] A Comprehensive Pipeline for Complex Text-to-Image Synthesis
    Fang, Fei
    Luo, Fei
    Zhang, Hong-Pan
    Zhou, Hua-Jian
    Chow, Alix L. H.
    Xiao, Chun-Xia
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2020, 35 (03) : 522 - 537
  • [32] Recurrent Affine Transformation for Text-to-Image Synthesis
    Ye, Senmao
    Wang, Huan
    Tan, Mingkui
    Liu, Fei
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 462 - 473
  • [33] Counterfactual GAN for debiased text-to-image synthesis
    Kong, Xianghua
    Xu, Ning
    Sun, Zefang
    Shen, Zhewen
    Zheng, Bolun
    Yan, Chenggang
    Cao, Jinbo
    Kang, Rongbao
    Liu, An-An
    MULTIMEDIA SYSTEMS, 2025, 31 (01)
  • [34] Dual Adversarial Inference for Text-to-Image Synthesis
    Lao, Qicheng
    Havaei, Mohammad
    Pesaranghader, Ahmad
    Dutil, Francis
    Di Jorio, Lisa
    Fevens, Thomas
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 7566 - 7575
  • [35] Modality Disentangled Discriminator for Text-to-Image Synthesis
    Feng, Fangxiang
    Niu, Tianrui
    Li, Ruifan
    Wang, Xiaojie
    IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 2112 - 2124
  • [36] An Improved AttnGAN Model for Text-to-Image Synthesis
    Gopalakrishnan, Remya
    Sambagni, Naveen
    Sudeep, P. V.
    COMPUTER VISION AND IMAGE PROCESSING, CVIP 2023, PT III, 2024, 2011 : 139 - 151
  • [37] CookGAN: Causality based Text-to-Image Synthesis
    Zhu, Bin
    Ngo, Chong-Wah
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 5518 - 5526
  • [38] Layout-Bridging Text-to-Image Synthesis
    Liang, Jiadong
    Pei, Wenjie
    Lu, Feng
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (12) : 7438 - 7451
  • [39] Grounded Text-to-Image Synthesis with Attention Refocusing
    Phung, Quynh
    Ge, Songwei
    Huang, Jia-Bin
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 7932 - 7942
  • [40] STRUCTURE-AWARE GENERATIVE ADVERSARIAL NETWORK FOR TEXT-TO-IMAGE GENERATION
    Chen, Wenjie
    Ni, Zhangkai
    Wang, Hanli
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 2075 - 2079