SceneScape: Text-Driven Consistent Scene Generation

被引:0
|
作者
Fridman, Rafail [1 ]
Abecasis, Amit [1 ]
Kasten, Yoni [2 ]
Dekel, Tali [1 ]
机构
[1] Weizmann Inst Sci, Rehovot, Israel
[2] NVIDIA Res, Santa Clara, CA USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a method for text-driven perpetual view generation - synthesizing long-term videos of various scenes solely from an input text prompt describing the scene and camera poses. We introduce a novel framework that generates such videos in an online fashion by combining the generative power of a pre-trained text-to-image model with the geometric priors learned by a pre-trained monocular depth prediction model. To tackle the pivotal challenge of achieving 3D consistency, i.e., synthesizing videos that depict geometrically-plausible scenes, we deploy an online test-time training to encourage the predicted depth map of the current frame to be geometrically consistent with the synthesized scene. The depth maps are used to construct a unified mesh representation of the scene, which is progressively constructed along the video generation process. In contrast to previous works, which are applicable only to limited domains, our method generates diverse scenes, such as walkthroughs in spaceships, caves, or ice castles. Project page: https://scenescape.github.io/
引用
收藏
页数:18
相关论文
共 50 条
  • [41] Text-driven Face Image Generation and Manipulation via Multi-level Residual Mapper
    Li Z.-L.
    Zhang S.-P.
    Liu Y.
    Zhang Z.-X.
    Zhang W.-G.
    Huang Q.-M.
    Ruan Jian Xue Bao/Journal of Software, 2023, 34 (05): : 2101 - 2115
  • [42] A Fast Text-Driven Approach for Generating Artistic Content
    Lupascu, Marian
    Murdock, Ryan
    Mironica, Ionut
    Li, Yijun
    PROCEEDINGS OF SIGGRAPH 2022 POSTERS, SIGGRAPH 2022, 2022,
  • [43] Scene Graph Driven Text-Prompt Generation for Image Inpainting
    Shukla, Tripti
    Maheshwari, Paridhi
    Singh, Rajhans
    Shukla, Ankita
    Kulkarni, Kuldeep
    Turaga, Pavan
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW, 2023, : 759 - 768
  • [44] Text2LIVE: Text-Driven Layered Image and Video Editing
    Bar-Tal, Omer
    Ofri-Amar, Dolev
    Fridman, Rafail
    Kasten, Yoni
    Dekel, Tali
    COMPUTER VISION - ECCV 2022, PT XV, 2022, 13675 : 707 - 723
  • [45] DeltaEdit: Exploring Text-free Training for Text-Driven Image Manipulation
    Lyu, Yueming
    Lin, Tianwei
    Li, Fu
    He, Dongliang
    Dong, Jing
    Tan, Tieniu
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 6894 - 6903
  • [46] Text-driven automatic frame generation using MPEG-4 synthetic/natural hybrid coding for 2-D head-and-shoulder scene
    Cheung, CH
    Po, LM
    INTERNATIONAL CONFERENCE ON IMAGE PROCESSING - PROCEEDINGS, VOL II, 1997, : 69 - 72
  • [47] Shape-aware Text-driven Layered Video Editing
    Lee, Yao-Chih
    Jang, Ji-Ze Genevieve
    Chen, Yi-Ting
    Qiu, Elizabeth
    Huang, Jia-Bin
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 14317 - 14326
  • [48] Lightweight Text-Driven Image Editing With Disentangled Content and Attributes
    Li, Bo
    Lin, Xiao
    Liu, Bin
    He, Zhi-Fen
    Lai, Yu-Kun
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 1829 - 1841
  • [49] Semiotic modeling of the text-driven conceptual paradigm in language education
    Li, Yufeng
    CHINESE SEMIOTIC STUDIES, 2021, 17 (04) : 661 - 683
  • [50] Advancing Text-Driven Chest X-Ray Generation with Policy-Based Reinforcement Learning
    Han, Woojung
    Kim, Chanyoung
    Ju, Dayun
    Shim, Yumin
    Hwang, Seong Jae
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT III, 2024, 15003 : 56 - 66