TRAINING-FREE LOCATION-AWARE TEXT-TO-IMAGE SYNTHESIS

被引:2
|
作者
Mao, Jiafeng [1 ]
Wang, Xueting [2 ]
机构
[1] Univ Tokyo, Dept Informat & Commun Engn, Tokyo, Japan
[2] CyberAgent Inc, AI Lab, Tokyo, Japan
关键词
diffusion model; text-to-image synthesis;
D O I
10.1109/ICIP49359.2023.10222616
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Current large-scale generative models have impressive efficiency in generating high-quality images based on text prompts. However, they lack the ability to precisely control the size and position of objects in the generated image. In this study(1), we analyze the generative mechanism of the stable diffusion model and propose a new interactive generation paradigm that allows users to specify the position of generated objects without additional training. Moreover, we propose an object detection-based evaluation metric to assess the control capability of location aware generation task. Our experimental results show that our method outperforms state-of-the-art methods on both control capacity and image quality.
引用
收藏
页码:995 / 999
页数:5
相关论文
共 50 条
  • [41] Capability-aware Prompt Reformulation Learning for Text-to-Image Generation
    Zhan, Jingtao
    Ai, Qingyao
    Liu, Yiqun
    Chen, Jia
    Ma, Shaoping
    PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 2145 - 2155
  • [42] Perceptual Pyramid Adversarial Networks for Text-to-Image Synthesis
    Gao, Lianli
    Chen, Daiyuan
    Song, Jingkuan
    Xu, Xing
    Zhang, Dongxiang
    Shen, Heng Tao
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 8312 - 8319
  • [43] PROMPTIST: Automated Prompt Optimization for Text-to-Image Synthesis
    Li, WeiJie
    Wane, Jin
    Zhang, Xuejie
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, PT II, NLPCC 2024, 2025, 15360 : 295 - 306
  • [44] Text-to-Image Synthesis Based on Machine Generated Captions
    Menardi, Marco
    Falcon, Alex
    Mohamed, Saida S.
    Seidenari, Lorenzo
    Serra, Giuseppe
    Del Bimbo, Alberto
    Tasso, Carlo
    DIGITAL LIBRARIES: THE ERA OF BIG DATA AND DATA SCIENCE, IRCDL 2020, 2020, 1177 : 62 - 74
  • [45] Stacking VAE and GAN for Context-aware Text-to-Image Generation
    Zhang, Chenrui
    Peng, Yuxin
    2018 IEEE FOURTH INTERNATIONAL CONFERENCE ON MULTIMEDIA BIG DATA (BIGMM), 2018,
  • [46] Rickrolling the Artist: Injecting Backdoors into Text Encoders for Text-to-Image Synthesis
    Struppek, Lukas
    Hintersdorf, Dominik
    Kersting, Kristian
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 4561 - 4573
  • [47] Text-to-Image Generation via Semi-Supervised Training
    Ji, Zhongyi
    Wang, Wenmin
    Chen, Baoyang
    Han, Xiao
    2020 IEEE INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP), 2020, : 265 - 268
  • [48] Inferring Semantic Layout for Hierarchical Text-to-Image Synthesis
    Hong, Seunghoon
    Yang, Dingdong
    Choi, Jongwook
    Lee, Honglak
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 7986 - 7994
  • [49] TISE: Bag of Metrics for Text-to-Image Synthesis Evaluation
    Dinh, Tan M.
    Rang Nguyen
    Binh-Son Hua
    COMPUTER VISION, ECCV 2022, PT XXXVI, 2022, 13696 : 594 - 609
  • [50] Semantic Object Accuracy for Generative Text-to-Image Synthesis
    Hinz, Tobias
    Heinrich, Stefan
    Wermter, Stefan
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (03) : 1552 - 1565