TRAINING-FREE LOCATION-AWARE TEXT-TO-IMAGE SYNTHESIS

被引:2
|
作者
Mao, Jiafeng [1 ]
Wang, Xueting [2 ]
机构
[1] Univ Tokyo, Dept Informat & Commun Engn, Tokyo, Japan
[2] CyberAgent Inc, AI Lab, Tokyo, Japan
关键词
diffusion model; text-to-image synthesis;
D O I
10.1109/ICIP49359.2023.10222616
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Current large-scale generative models have impressive efficiency in generating high-quality images based on text prompts. However, they lack the ability to precisely control the size and position of objects in the generated image. In this study(1), we analyze the generative mechanism of the stable diffusion model and propose a new interactive generation paradigm that allows users to specify the position of generated objects without additional training. Moreover, we propose an object detection-based evaluation metric to assess the control capability of location aware generation task. Our experimental results show that our method outperforms state-of-the-art methods on both control capacity and image quality.
引用
收藏
页码:995 / 999
页数:5
相关论文
共 50 条
  • [21] Language-vision matching for text-to-image synthesis with context-aware GAN
    Hou, Yingli
    Zhang, Wei
    Zhu, Zhiliang
    Yu, Hai
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 255
  • [22] DAE-GAN: Dynamic Aspect-aware GAN for Text-to-Image Synthesis
    Ruan, Shulan
    Zhang, Yong
    Zhang, Kun
    Fan, Yanbo
    Tang, Fan
    Liu, Qi
    Chen, Enhong
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 13940 - 13949
  • [23] Location Obfuscation Framework for Training-Free Localization System
    Doan, Thong M.
    Dinh, Han N.
    Nguyen, Nam T.
    Tran, Phuoc T.
    INFORMATION SYSTEMS SECURITY (ICISS 2014), 2014, 8880 : 464 - 476
  • [24] AtHom: Two Divergent Attentions Stimulated By Homomorphic Training in Text-to-Image Synthesis
    Shi, Zhenbo
    Chen, Zhi
    Xu, Zhenbo
    Yang, Wei
    Huang, Liusheng
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 2211 - 2219
  • [25] Scaling up GANs for Text-to-Image Synthesis
    Kang, Minguk
    Zhu, Jun-Yan
    Zhang, Richard
    Park, Jaesik
    Shechtman, Eli
    Paris, Sylvain
    Park, Taesung
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 10124 - 10134
  • [26] Efficient Neural Architecture for Text-to-Image Synthesis
    Souza, Douglas M.
    Wehrmann, Jonatas
    Ruiz, Duncan D.
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [27] FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis
    Huang, Linjiang
    Fang, Rongyao
    Zhang, Aiping
    Song, Guanglu
    Liu, Si
    Liu, Yu
    Li, Hongsheng
    COMPUTER VISION - ECCV 2024, PT XII, 2025, 15070 : 196 - 212
  • [28] Joint Embedding based Text-to-Image Synthesis
    Wang, Menglan
    Yu, Yue
    Li, Benyuan
    2020 IEEE 32ND INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI), 2020, : 432 - 436
  • [29] Text-to-Image Synthesis via Aesthetic Layout
    Baraheem, Samah Saeed
    Trung-Nghia Le
    Nguyen, Tam, V
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 4485 - 4487
  • [30] A Comprehensive Pipeline for Complex Text-to-Image Synthesis
    Fei Fang
    Fei Luo
    Hong-Pan Zhang
    Hua-Jian Zhou
    Alix L. H. Chow
    Chun-Xia Xiao
    Journal of Computer Science and Technology, 2020, 35 : 522 - 537