Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation

被引:11
|
作者
Yang, Shuai [1 ]
Zhou, Yifan [1 ]
Liu, Ziwei [1 ]
Loy, Chen Change [1 ]
机构
[1] Nanyang Technol Univ, S Lab, Singapore, Singapore
来源
PROCEEDINGS OF THE SIGGRAPH ASIA 2023 CONFERENCE PAPERS | 2023年
关键词
Video translation; temporal consistency; off-the-shelf Stable Diffusion; optical flow;
D O I
10.1145/3610548.3618160
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large text-to-image diffusion models have exhibited impressive proficiency in generating high-quality images. However, when applying these models to video domain, ensuring temporal consistency across video frames remains a formidable challenge. This paper proposes a novel zero-shot text-guided video-to-video translation framework to adapt image models to videos. The framework includes two parts: key frame translation and full video translation. The first part uses an adapted diffusion model to generate key frames, with hierarchical cross-frame constraints applied to enforce coherence in shapes, textures and colors. The second part propagates the key frames to other frames with temporal-aware patch matching and frame blending. Our framework achieves global style and local texture temporal consistency at a low cost (without re-training or optimization). The adaptation is compatible with existing image diffusion techniques, allowing our framework to take advantage of them, such as customizing a specific subject with LoRA, and introducing extra spatial guidance with ControlNet. Extensive experimental results demonstrate the effectiveness of our proposed framework over existing methods in rendering high-quality and temporally-coherent videos. Code is available at our project page: https://www.mmlab-ntu.com/project/rerender/
引用
收藏
页数:11
相关论文
共 50 条
  • [21] Unsupervised video-to-video translation with preservation of frame modification tendency
    Liu, Huajun
    Li, Chao
    Lei, Dian
    Zhu, Qing
    VISUAL COMPUTER, 2020, 36 (10-12): : 2105 - 2116
  • [22] Preserving Semantic and Temporal Consistency for Unpaired Video-to-Video Translation
    Park, Kwanyong
    Woo, Sanghyun
    Kim, Dahun
    Cho, Donghyeon
    Kweon, In So
    PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 1248 - 1257
  • [23] Unsupervised video-to-video translation with preservation of frame modification tendency
    Huajun Liu
    Chao Li
    Dian Lei
    Qing Zhu
    The Visual Computer, 2020, 36 : 2105 - 2116
  • [24] Zero-Shot Video Retrieval Using Content and Concepts
    Dalton, Jeffrey
    Allan, James
    Mirajkar, Pranav
    PROCEEDINGS OF THE 22ND ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM'13), 2013, : 1857 - 1860
  • [25] Orthogonal Temporal Interpolation for Zero-Shot Video Recognition
    Zhu, Yan
    Zhuo, Junbao
    Ma, Bin
    Geng, Jiajia
    Wei, Xiaoming
    Wei, Xiaolin
    Wang, Shuhui
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 7491 - 7501
  • [26] Latent Concept Extraction for Zero-shot Video Retrieval
    Ueki, Kazuya
    2018 INTERNATIONAL CONFERENCE ON IMAGE AND VISION COMPUTING NEW ZEALAND (IVCNZ), 2018,
  • [27] Learning to Model Relationships for Zero-Shot Video Classification
    Gao, Junyu
    Zhang, Tianzhu
    Xu, Changsheng
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (10) : 3476 - 3491
  • [28] VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding
    Xu, Hu
    Ghosh, Gargi
    Huang, Po-Yao
    Okhonko, Dmytro
    Aghajanyan, Armen
    Metze, Florian
    Zettlemoyer, Luke
    Feichtenhofer, Christoph
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 6787 - 6800
  • [29] WAVE: Warping DDIM Inversion Features for Zero-Shot Text-to-Video Editing
    Feng, Yutang
    Gao, Sicheng
    Bao, Yuxiang
    Wang, Xiaodi
    Han, Shumin
    Zhang, Juan
    Zhang, Baochang
    Yao, Angela
    COMPUTER VISION - ECCV 2024, PT LXXVI, 2025, 15134 : 38 - 55
  • [30] CLIPCAM: A SIMPLE BASELINE FOR ZERO-SHOT TEXT-GUIDED OBJECT AND ACTION LOCALIZATION
    Hsia, Hsuan-An
    Lin, Che-Hsien
    Kung, Bo-Han
    Chen, Jhao-Ting
    Tan, Daniel Stanley
    Chen, Jun-Cheng
    Hua, Kai-Lung
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4453 - 4457