TRAINING-FREE LOCATION-AWARE TEXT-TO-IMAGE SYNTHESIS

被引：2

作者：

Mao, Jiafeng ^{[1
]}

Wang, Xueting ^{[2
]}

机构：

[1] Univ Tokyo, Dept Informat & Commun Engn, Tokyo, Japan

[2] CyberAgent Inc, AI Lab, Tokyo, Japan

来源：

2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP | 2023年

关键词：

diffusion model; text-to-image synthesis;

D O I：

10.1109/ICIP49359.2023.10222616

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Current large-scale generative models have impressive efficiency in generating high-quality images based on text prompts. However, they lack the ability to precisely control the size and position of objects in the generated image. In this study(1), we analyze the generative mechanism of the stable diffusion model and propose a new interactive generation paradigm that allows users to specify the position of generated objects without additional training. Moreover, we propose an object detection-based evaluation metric to assess the control capability of location aware generation task. Our experimental results show that our method outperforms state-of-the-art methods on both control capacity and image quality.

引用

页码：995 / 999

页数：5

共 50 条

[1] Training-Free Consistent Text-to-Image Generation
Tewel, Yoad
Kaduri, Omri
Gal, Rinon
Kasten, Yoni
Wolf, Lior
Chechik, Gal
Atzmon, Yuval
ACM TRANSACTIONS ON GRAPHICS, 2024, 43 (04):
[2] BoxDiff: Text-to-Image Synthesis with Training-Free Box-Constrained Diffusion
Xie, Jinheng
Li, Yuexiang
Huang, Yawen
Liu, Haozhe
Zhang, Wentian
Zheng, Yefeng
Shou, Mike Zheng
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 7417 - 7427
[3] Zero-Painter: Training-Free Layout Control for Text-to-Image Synthesis
Ohanyan, Marianna
Manukyan, Hayk
Wang, Zhangyang
Navasardyan, Shant
Shi, Humphrey
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 8764 - 8774
[4] Training-free Diffusion Model Adaptation for Variable-Sized Text-to-Image Synthesis
Jin, Zhiyu
Shen, Xuli
Li, Bin
Xue, Xiangyang
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[5] RECON: Training-Free Acceleration for Text-to-Image Synthesis with Retrieval of Concept Prompt Trajectories
Lu, Chen-Yi
Agarwal, Shubham
Tanjim, Md Mehrab
Mahadik, Kanak
Rao, Anup
Mitra, Subrata
Saini, Shiv Kumar
Bagchi, Saurabh
Chaterji, Somali
COMPUTER VISION - ECCV 2024, PT LIX, 2025, 15117 : 288 - 306
[6] Check, Locate, Rectify: A Training-Free Layout Calibration System for Text-to-Image Generation
Gong, Biao
Huang, Siteng
Feng, Yutong
Zhang, Shiwei
Li, Yuyuan
Liu, Yu
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 6624 - 6634
[7] FreeControl: Training-Free Spatial Control of Any Text-to-Image Diffusion Model with Any Condition
Mo, Sicheng
Mu, Fangzhou
Lin, Kuan Heng
Liu, Yanli
Guan, Bochen
Li, Yin
Zhou, Bolei
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 7465 - 7475
[8] Semantic-Aware Data Augmentation for Text-to-Image Synthesis
Tan, Zhaorui
Yang, Xi
Huang, Kaizhu
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 6, 2024, : 5098 - 5107
[9] Towards Language-Free Training for Text-to-Image Generation
Zhou, Yufan
Zhang, Ruiyi
Chen, Changyou
Li, Chunyuan
Tensmeyer, Chris
Yu, Tong
Gu, Jiuxiang
Xu, Jinhui
Sun, Tong
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 17886 - 17896
[10] Survey of text-to-image synthesis
Cao Y.
Qin J.
Ma Q.
Sun H.
Yan K.
Wang L.
Ren J.
Zhejiang Daxue Xuebao (Gongxue Ban)/Journal of Zhejiang University (Engineering Science), 2024, 58 (02): : 219 - 238

← 1 2 3 4 5 →