SceneScape: Text-Driven Consistent Scene Generation

被引:0
|
作者
Fridman, Rafail [1 ]
Abecasis, Amit [1 ]
Kasten, Yoni [2 ]
Dekel, Tali [1 ]
机构
[1] Weizmann Inst Sci, Rehovot, Israel
[2] NVIDIA Res, Santa Clara, CA USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a method for text-driven perpetual view generation - synthesizing long-term videos of various scenes solely from an input text prompt describing the scene and camera poses. We introduce a novel framework that generates such videos in an online fashion by combining the generative power of a pre-trained text-to-image model with the geometric priors learned by a pre-trained monocular depth prediction model. To tackle the pivotal challenge of achieving 3D consistency, i.e., synthesizing videos that depict geometrically-plausible scenes, we deploy an online test-time training to encourage the predicted depth map of the current frame to be geometrically consistent with the synthesized scene. The depth maps are used to construct a unified mesh representation of the scene, which is progressively constructed along the video generation process. In contrast to previous works, which are applicable only to limited domains, our method generates diverse scenes, such as walkthroughs in spaceships, caves, or ice castles. Project page: https://scenescape.github.io/
引用
收藏
页数:18
相关论文
共 50 条
  • [1] 3D-SceneDreamer: Text-Driven 3D-Consistent Scene Generation
    Zhang, Songchun
    Zhang, Yibo
    Zheng, Quan
    Mae, Rui
    Hua, Wei
    Bao, Hujun
    Xu, Weiwei
    Zou, Changqing
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 10170 - 10180
  • [2] Text2NeRF: Text-Driven 3D Scene Generation With Neural Radiance Fields
    Zhang, Jingbo
    Li, Xiaoyu
    Wan, Ziyu
    Wang, Can
    Liao, Jing
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2024, 30 (12) : 7749 - 7762
  • [3] Text2Scene: Text-driven Indoor Scene Stylization with Part-aware Details
    Hwang, Inwoo
    Kim, Hyeonwoo
    Kim, Young Min
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 1890 - 1899
  • [4] Text2Performer: Text-Driven Human Video Generation
    Jiang, Yuming
    Yang, Shuai
    Koh, Tong Liang
    Wu, Wayne
    Loy, Chen Change
    Liu, Ziwei
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 22690 - 22700
  • [5] MotionDiffuse: Text-Driven Human Motion Generation With Diffusion Model
    Zhang, Mingyuan
    Cai, Zhongang
    Pan, Liang
    Hong, Fangzhou
    Guo, Xinying
    Yang, Lei
    Liu, Ziwei
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (06) : 4115 - 4128
  • [6] Text-driven human image generation with texture and pose control
    Jin, Zhedong
    Xia, Guiyu
    Yang, Paike
    Wang, Mengxiang
    Sun, Yubao
    Liu, Qingshan
    NEUROCOMPUTING, 2025, 634
  • [7] Open-Vocabulary Text-Driven Human Image Generation
    Zhang, Kaiduo
    Sun, Muyi
    Sun, Jianxin
    Zhang, Kunbo
    Sun, Zhenan
    Tan, Tieniu
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (10) : 4379 - 4397
  • [8] Text2Human: Text-Driven Controllable Human Image Generation
    Jiang, Yuming
    Yang, Shuai
    Qju, Haonan
    Wu, Wayne
    Loy, Chen Change
    Liu, Ziwei
    ACM TRANSACTIONS ON GRAPHICS, 2022, 41 (04):
  • [9] DreamEditor: Text-Driven 3D Scene Editing with Neural Fields
    Zhuang, Jingyu
    Wang, Chen
    Lin, Liang
    Liu, Lingjie
    Li, Guanbin
    PROCEEDINGS OF THE SIGGRAPH ASIA 2023 CONFERENCE PAPERS, 2023,
  • [10] Text-driven Visual Prosody Generation for Embodied Conversational Agents
    Chen, Jiali
    Liu, Yong
    Zhang, Zhimeng
    Fan, Changjie
    Ding, Yu
    PROCEEDINGS OF THE 19TH ACM INTERNATIONAL CONFERENCE ON INTELLIGENT VIRTUAL AGENTS (IVA' 19), 2019, : 108 - 110