SceneScape: Text-Driven Consistent Scene Generation

被引：0

作者：

Fridman, Rafail ^{[1
]}

Abecasis, Amit ^{[1
]}

Kasten, Yoni ^{[2
]}

Dekel, Tali ^{[1
]}

机构：

[1] Weizmann Inst Sci, Rehovot, Israel

[2] NVIDIA Res, Santa Clara, CA USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023) | 2023年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We present a method for text-driven perpetual view generation - synthesizing long-term videos of various scenes solely from an input text prompt describing the scene and camera poses. We introduce a novel framework that generates such videos in an online fashion by combining the generative power of a pre-trained text-to-image model with the geometric priors learned by a pre-trained monocular depth prediction model. To tackle the pivotal challenge of achieving 3D consistency, i.e., synthesizing videos that depict geometrically-plausible scenes, we deploy an online test-time training to encourage the predicted depth map of the current frame to be geometrically consistent with the synthesized scene. The depth maps are used to construct a unified mesh representation of the scene, which is progressively constructed along the video generation process. In contrast to previous works, which are applicable only to limited domains, our method generates diverse scenes, such as walkthroughs in spaceships, caves, or ice castles. Project page: https://scenescape.github.io/

引用

页数：18

共 50 条

[31] Free-Editor: Zero-Shot Text-Driven 3D Scene Editing
Karim, Nazmul
Igbal, Hasan
Khalid, Umar
Chen, Chen
Hua, Jing
COMPUTER VISION - ECCV 2024, PT LXXX, 2025, 15138 : 436 - 453
[32] StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery
Patashnik, Or
Wu, Zongze
Shechtman, Eli
Cohen-Or, Daniel
Lischinski, Dani
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 2065 - 2074
[33] InterFusion: Text-Driven Generation of 3D Human-Object Interaction
Dai, Sisi
Li, Wenhao
Sun, Haowen
Huang, Haibin
Ma, Chongyang
Huang, Hui
Xu, Kai
Hu, Ruizhen
COMPUTER VISION - ECCV 2024, PT XLVIII, 2025, 15106 : 18 - 35
[34] Text2Mesh: Text-Driven Neural Stylization for Meshes
Michel, Oscar
Bar-On, Roi
Liu, Richard
Benaim, Sagie
Hanocka, Rana
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 13482 - 13492
[35] Unsupervised Prompt Tuning for Text-Driven Object Detection
He, Weizhen
Chen, Weijie
Chen, Binbin
Yang, Shicai
Xie, Di
Lin, Luojun
Qi, Donglian
Zhuang, Yueting
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 2651 - 2661
[36] AvatarCLIP: Zero-Shot Text-Driven Generation and Animation of 3D Avatars
Hong, Fangzhou
Zhang, Mingyuan
Pan, Liang
Cai, Zhongang
Yang, Lei
Liu, Ziwei
ACM TRANSACTIONS ON GRAPHICS, 2022, 41 (04):
[37] Exploring Text-Driven Approaches for Online Action Detection
Benavent-Lledo, Manuel
Mulero-Perez, David
Ortiz-Perez, David
Garcia-Rodriguez, Jose
Orts-Escolano, Sergio
BIOINSPIRED SYSTEMS FOR TRANSLATIONAL APPLICATIONS: FROM ROBOTICS TO SOCIAL ENGINEERING, PT II, IWINAC 2024, 2024, 14675 : 55 - 64
[38] Blended Diffusion for Text-driven Editing of Natural Images
Avrahami, Omri
Lischinski, Dani
Fried, Ohad
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 18187 - 18197
[39] Comparing text-driven and speech-driven visual speech synthesisers
Theobald, Barry-John
Cawley, Gavin
Bangham, Andrew
Matthews, Iain
Wilkinson, Nicholas
INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 2322 - 2322
[40] Explainable Text-Driven Neural Network for Stock Prediction
Yang, Linyi
Zhang, Zheng
Xiong, Su
Wei, Lirui
Ng, James
Xu, Lina
Dong, Ruihai
PROCEEDINGS OF 2018 5TH IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS (CCIS), 2018, : 441 - 445

← 1 2 3 4 5 →