Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation

被引:11
|
作者
Yang, Shuai [1 ]
Zhou, Yifan [1 ]
Liu, Ziwei [1 ]
Loy, Chen Change [1 ]
机构
[1] Nanyang Technol Univ, S Lab, Singapore, Singapore
关键词
Video translation; temporal consistency; off-the-shelf Stable Diffusion; optical flow;
D O I
10.1145/3610548.3618160
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large text-to-image diffusion models have exhibited impressive proficiency in generating high-quality images. However, when applying these models to video domain, ensuring temporal consistency across video frames remains a formidable challenge. This paper proposes a novel zero-shot text-guided video-to-video translation framework to adapt image models to videos. The framework includes two parts: key frame translation and full video translation. The first part uses an adapted diffusion model to generate key frames, with hierarchical cross-frame constraints applied to enforce coherence in shapes, textures and colors. The second part propagates the key frames to other frames with temporal-aware patch matching and frame blending. Our framework achieves global style and local texture temporal consistency at a low cost (without re-training or optimization). The adaptation is compatible with existing image diffusion techniques, allowing our framework to take advantage of them, such as customizing a specific subject with LoRA, and introducing extra spatial guidance with ControlNet. Extensive experimental results demonstrate the effectiveness of our proposed framework over existing methods in rendering high-quality and temporally-coherent videos. Code is available at our project page: https://www.mmlab-ntu.com/project/rerender/
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators
    Khachatryan, Levon
    Movsisyan, Andranik
    Tadevosyan, Vahram
    Henschel, Roberto
    Wang, Zhangyang
    Navasardyan, Shant
    Shi, Humphrey
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 15908 - 15918
  • [2] Few-shot Video-to-Video Synthesis
    Wang, Ting-Chun
    Liu, Ming-Yu
    Tao, Andrew
    Liu, Guilin
    Kautz, Jan
    Catanzaro, Bryan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [3] Text-Guided Video Masked Autoencoder
    Fan, David
    Wang, Jue
    Liao, Shuai
    Zhang, Zhikang
    Bhat, Vimal
    Li, Xinyu
    COMPUTER VISION - ECCV 2024, PT V, 2025, 15063 : 282 - 298
  • [4] VidToMe: Video Token Merging for Zero-Shot Video Editing
    Li, Xirui
    Ma, Chao
    Yang, Xiaokang
    Yang, Ming-Hsuan
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 7486 - 7495
  • [5] Text-guided distillation learning to diversify video embeddings for text-video retrieval
    Lee, Sangmin
    Kim, Hyung-Il
    Ro, Yong Man
    PATTERN RECOGNITION, 2024, 156
  • [6] Video-to-Video Translation with Global Temporal Consistency
    Wei, Xingxing
    Zhu, Jun
    Feng, Sitong
    Su, Hang
    PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 18 - 25
  • [7] Zero-Shot Text-Guided Object Generation with Dream Fields
    Jain, Ajay
    Mildenhall, Ben
    Barron, Jonathan T.
    Abbeel, Pieter
    Poole, Ben
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 857 - 866
  • [8] Text-guided Graph Temporal Modeling for few-shot video classification
    Deng, Fuqin
    Zhong, Jiaming
    Li, Nannan
    Fu, Lanhui
    Jiang, Bingchun
    Yi, Ningbo
    Qi, Feng
    Xin, He
    Lam, Tin Lun
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 137
  • [9] Zero-Shot Video Moment Retrieval With Angular Reconstructive Text Embeddings
    Jiang, Xun
    Xu, Xing
    Zhou, Zailei
    Yang, Yang
    Shen, Fumin
    Shen, Heng Tao
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 9657 - 9670
  • [10] FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation
    Yang, Shuai
    Zhou, Yifan
    Liu, Ziwei
    Loy, Chen Change
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 8703 - 8712