LayoutGPT: Compositional Visual Planning and Generation with Large Language Models

被引:0
|
作者
Feng, Weixi [1 ]
Zhu, Wanrong [1 ]
Fu, Tsu-jui [1 ]
Jampani, Varun [2 ]
Akula, Arjun [2 ]
He, Xuehai [3 ]
Basu, Sugato [2 ]
Wang, Xin Eric [3 ]
Wang, William Yang [1 ]
机构
[1] Univ Calif Santa Barbara, Santa Barbara, CA 93106 USA
[2] Google, Mountain View, CA USA
[3] Univ Calif Santa Cruz, Santa Cruz, CA USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Attaining a high degree of user controllability in visual generation often requires intricate, fine-grained inputs like layouts. However, such inputs impose a substantial burden on users when compared to simple text inputs. To address the issue, we study how Large Language Models (LLMs) can serve as visual planners by generating layouts from text conditions, and thus collaborate with visual generative models. We propose LayoutGPT, a method to compose in-context visual demonstrations in style sheet language to enhance the visual planning skills of LLMs. LayoutGPT can generate plausible layouts in multiple domains, ranging from 2D images to 3D indoor scenes. LayoutGPT also shows superior performance in converting challenging language concepts like numerical and spatial relations to layout arrangements for faithful text-to-image generation. When combined with a downstream image generation model, LayoutGPT outperforms text-to-image models/systems by 20-40% and achieves comparable performance as human users in designing visual layouts for numerical and spatial correctness. Lastly, LayoutGPT achieves comparable performance to supervised methods in 3D indoor scene synthesis, demonstrating its effectiveness and potential in multiple visual domains.
引用
收藏
页数:26
相关论文
共 50 条
  • [1] Self-Planning Code Generation with Large Language Models
    Jiang, Xue
    Dong, Yihong
    Wang, Lecheng
    Fang, Zheng
    Shang, Qiwei
    Li, Ge
    Jin, Zhi
    Jiao, Wenpin
    ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY, 2024, 33 (07)
  • [2] Compositional Visual Generation with Composable Diffusion Models
    Liu, Nan
    Li, Shuang
    Du, Yilun
    Torralba, Antonio
    Tenenbaum, Joshua B.
    COMPUTER VISION - ECCV 2022, PT XVII, 2022, 13677 : 423 - 439
  • [3] PROGPROMPT: program generation for situated robot task planning using large language models
    Singh, Ishika
    Blukis, Valts
    Mousavian, Arsalan
    Goyal, Ankit
    Xu, Danfei
    Tremblay, Jonathan
    Fox, Dieter
    Thomason, Jesse
    Garg, Animesh
    AUTONOMOUS ROBOTS, 2023, 47 (08) : 999 - 1012
  • [4] ProgPrompt: program generation for situated robot task planning using large language models
    Ishika Singh
    Valts Blukis
    Arsalan Mousavian
    Ankit Goyal
    Danfei Xu
    Jonathan Tremblay
    Dieter Fox
    Jesse Thomason
    Animesh Garg
    Autonomous Robots, 2023, 47 : 999 - 1012
  • [5] A Survey on Multimodal Large Language Models in Radiology for Report Generation and Visual Question Answering
    Yi, Ziruo
    Xiao, Ting
    Albert, Mark V.
    INFORMATION, 2025, 16 (02)
  • [6] Large Language Models are Visual Reasoning Coordinators
    Chen, Liangyu
    Li, Bo
    Shen, Sheng
    Yang, Jingkang
    Li, Chunyuan
    Keutzer, Kurt
    Darrell, Trevor
    Liu, Ziwei
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [7] Visual cognition in multimodal large language models
    Buschoff, Luca M. Schulze
    Akata, Elif
    Bethge, Matthias
    Schulz, Eric
    NATURE MACHINE INTELLIGENCE, 2025, 7 (01) : 96 - 106
  • [8] Investigating Compositional Challenges in Vision-Language Models for Visual Grounding
    Zeng, Yunan
    Huang, Yan
    Zhang, Jinjin
    Jie, Zequn
    Chai, Zhenhua
    Wang, Liang
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 14141 - 14151
  • [9] Game Generation via Large Language Models
    Hu, Chengpeng
    Zhao, Yunlong
    Liu, Jialin
    2024 IEEE CONFERENCE ON GAMES, COG 2024, 2024,
  • [10] Level Generation Through Large Language Models
    Todd, Graham
    Earle, Sam
    Nasir, Muhammad Umair
    Green, Michael Cerny
    Togelius, Julian
    PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON THE FOUNDATIONS OF DIGITAL GAMES, FDG 2023, 2023,