Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation

被引：11

作者：

Yang, Shuai ^{[1
]}

Zhou, Yifan ^{[1
]}

Liu, Ziwei ^{[1
]}

Loy, Chen Change ^{[1
]}

机构：

[1] Nanyang Technol Univ, S Lab, Singapore, Singapore

来源：

PROCEEDINGS OF THE SIGGRAPH ASIA 2023 CONFERENCE PAPERS | 2023年

关键词：

Video translation; temporal consistency; off-the-shelf Stable Diffusion; optical flow;

D O I：

10.1145/3610548.3618160

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Large text-to-image diffusion models have exhibited impressive proficiency in generating high-quality images. However, when applying these models to video domain, ensuring temporal consistency across video frames remains a formidable challenge. This paper proposes a novel zero-shot text-guided video-to-video translation framework to adapt image models to videos. The framework includes two parts: key frame translation and full video translation. The first part uses an adapted diffusion model to generate key frames, with hierarchical cross-frame constraints applied to enforce coherence in shapes, textures and colors. The second part propagates the key frames to other frames with temporal-aware patch matching and frame blending. Our framework achieves global style and local texture temporal consistency at a low cost (without re-training or optimization). The adaptation is compatible with existing image diffusion techniques, allowing our framework to take advantage of them, such as customizing a specific subject with LoRA, and introducing extra spatial guidance with ControlNet. Extensive experimental results demonstrate the effectiveness of our proposed framework over existing methods in rendering high-quality and temporally-coherent videos. Code is available at our project page: https://www.mmlab-ntu.com/project/rerender/

引用

页数：11

共 50 条

[21] Unsupervised video-to-video translation with preservation of frame modification tendency
Liu, Huajun
Li, Chao
Lei, Dian
Zhu, Qing
VISUAL COMPUTER, 2020, 36 (10-12): : 2105 - 2116
[22] Preserving Semantic and Temporal Consistency for Unpaired Video-to-Video Translation
Park, Kwanyong
Woo, Sanghyun
Kim, Dahun
Cho, Donghyeon
Kweon, In So
PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 1248 - 1257
[23] Unsupervised video-to-video translation with preservation of frame modification tendency
Huajun Liu
Chao Li
Dian Lei
Qing Zhu
The Visual Computer, 2020, 36 : 2105 - 2116
[24] Zero-Shot Video Retrieval Using Content and Concepts
Dalton, Jeffrey
Allan, James
Mirajkar, Pranav
PROCEEDINGS OF THE 22ND ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM'13), 2013, : 1857 - 1860
[25] Orthogonal Temporal Interpolation for Zero-Shot Video Recognition
Zhu, Yan
Zhuo, Junbao
Ma, Bin
Geng, Jiajia
Wei, Xiaoming
Wei, Xiaolin
Wang, Shuhui
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 7491 - 7501
[26] Latent Concept Extraction for Zero-shot Video Retrieval
Ueki, Kazuya
2018 INTERNATIONAL CONFERENCE ON IMAGE AND VISION COMPUTING NEW ZEALAND (IVCNZ), 2018,
[27] Learning to Model Relationships for Zero-Shot Video Classification
Gao, Junyu
Zhang, Tianzhu
Xu, Changsheng
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (10) : 3476 - 3491
[28] VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding
Xu, Hu
Ghosh, Gargi
Huang, Po-Yao
Okhonko, Dmytro
Aghajanyan, Armen
Metze, Florian
Zettlemoyer, Luke
Feichtenhofer, Christoph
2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 6787 - 6800
[29] WAVE: Warping DDIM Inversion Features for Zero-Shot Text-to-Video Editing
Feng, Yutang
Gao, Sicheng
Bao, Yuxiang
Wang, Xiaodi
Han, Shumin
Zhang, Juan
Zhang, Baochang
Yao, Angela
COMPUTER VISION - ECCV 2024, PT LXXVI, 2025, 15134 : 38 - 55
[30] CLIPCAM: A SIMPLE BASELINE FOR ZERO-SHOT TEXT-GUIDED OBJECT AND ACTION LOCALIZATION
Hsia, Hsuan-An
Lin, Che-Hsien
Kung, Bo-Han
Chen, Jhao-Ting
Tan, Daniel Stanley
Chen, Jun-Cheng
Hua, Kai-Lung
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4453 - 4457

← 1 2 3 4 5 →