SceneScape: Text-Driven Consistent Scene Generation

被引:0
|
作者
Fridman, Rafail [1 ]
Abecasis, Amit [1 ]
Kasten, Yoni [2 ]
Dekel, Tali [1 ]
机构
[1] Weizmann Inst Sci, Rehovot, Israel
[2] NVIDIA Res, Santa Clara, CA USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a method for text-driven perpetual view generation - synthesizing long-term videos of various scenes solely from an input text prompt describing the scene and camera poses. We introduce a novel framework that generates such videos in an online fashion by combining the generative power of a pre-trained text-to-image model with the geometric priors learned by a pre-trained monocular depth prediction model. To tackle the pivotal challenge of achieving 3D consistency, i.e., synthesizing videos that depict geometrically-plausible scenes, we deploy an online test-time training to encourage the predicted depth map of the current frame to be geometrically consistent with the synthesized scene. The depth maps are used to construct a unified mesh representation of the scene, which is progressively constructed along the video generation process. In contrast to previous works, which are applicable only to limited domains, our method generates diverse scenes, such as walkthroughs in spaceships, caves, or ice castles. Project page: https://scenescape.github.io/
引用
收藏
页数:18
相关论文
共 50 条
  • [21] Text2Light. Zero-Shot Text-Driven HDR Panorama Generation
    Chen, Zhaoxi
    Wang, Guangcong
    Liu, Ziwei
    ACM TRANSACTIONS ON GRAPHICS, 2022, 41 (06):
  • [22] RoomDreamer: Text-Driven 3D Indoor Scene Synthesis with Coherent Geometry and Texture
    Song, Liangchen
    Cao, Liangliang
    Xu, Hongyu
    Kang, Kai
    Tang, Feng
    Yuan, Junsong
    Yang, Zhao
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 6898 - 6906
  • [23] The Framework of Text-driven Business Intelligence
    Zhou, Ning
    Cheng, Hongli
    Chen, Hongqin
    Xiao, Shuang
    2007 INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, NETWORKING AND MOBILE COMPUTING, VOLS 1-15, 2007, : 5468 - 5471
  • [24] Text2Palette: Text-Driven Color Palette Generation Using Internet Images
    Lei K.
    Liu Z.
    Xu K.
    Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2021, 33 (05): : 694 - 703
  • [25] CLIPTexture: Text-driven Texture Synthesis
    Song, Yiren
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 5468 - 5476
  • [26] Text-Driven Separation of Arbitrary Sounds
    Kilgour, Kevin
    Gfeller, Beat
    Huang, Qingqing
    Jansen, Aren
    Wisdom, Scott
    Tagliasacchi, Marco
    INTERSPEECH 2022, 2022, : 5403 - 5407
  • [27] Text2VRScene: Exploring the Framework of Automated Text-driven Generation System for VR Experience
    Yin, Zhizhuo
    Wang, Yuyang
    Papatheodorou, Theodoros
    Hui, Pan
    2024 IEEE CONFERENCE ON VIRTUAL REALITY AND 3D USER INTERFACES, VR 2024, 2024, : 701 - 711
  • [28] Text-Driven Chinese Sign Language Synthesis
    徐琳
    高文
    晏洁
    Journal of Harbin Institute of Technology, 1998, (03) : 93 - 98
  • [29] A text-driven sign language synthesis system
    Gao, W
    Xu, L
    Yin, BC
    Liu, Y
    Song, YB
    Yan, J
    Zhou, J
    Chen, HT
    FIFTH INTERNATIONAL CONFERENCE ON COMPUTER-AIDED DESIGN & COMPUTER GRAPHICS, VOLS 1 AND 2, 1997, : 6 - 11
  • [30] Text-driven Speech Animation with Emotion Control
    Chae, Wonseok
    Kim, Yejin
    KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2020, 14 (08): : 3473 - 3487