MeDM: Mediating Image Diffusion Models for Video-to-Video Translation with Temporal Correspondence Guidance

被引:0
|
作者
Chu, Ernie [1 ]
Huang, Tzuhsuan [1 ]
Lin, Shuo-Yen [1 ]
Chen, Jun-Cheng [1 ]
机构
[1] Acad Sinica, Res Ctr Informat Technol Innovat, 128 Acad Rd,Sect 2, Taipei, Taiwan
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This study introduces an efficient and effective method, MeDM, that utilizes pre-trained image Diffusion Models for video-to-video translation with consistent temporal flow. The proposed framework can render videos from scene position information, such as a normal G-buffer, or perform text-guided editing on videos captured in real-world scenarios. We employ explicit optical flows to construct a practical coding that enforces physical constraints on generated frames and mediates independent frame-wise scores. By leveraging this coding, maintaining temporal consistency in the generated videos can be framed as an optimization problem with a closed-form solution. To ensure compatibility with Stable Diffusion, we also suggest a workaround for modifying observation-space scores in latent Diffusion Models. Notably, MeDM does not require fine-tuning or test-time optimization of the Diffusion Models. Through extensive qualitative, quantitative, and subjective experiments on various benchmarks, the study demonstrates the effectiveness and superiority of the proposed approach. Our project page can be found at https://medm2023.github.io/.
引用
收藏
页码:1353 / 1361
页数:9
相关论文
共 50 条
  • [1] Video-to-Video Translation with Global Temporal Consistency
    Wei, Xingxing
    Zhu, Jun
    Feng, Sitong
    Su, Hang
    PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 18 - 25
  • [2] Preserving Semantic and Temporal Consistency for Unpaired Video-to-Video Translation
    Park, Kwanyong
    Woo, Sanghyun
    Kim, Dahun
    Cho, Donghyeon
    Kweon, In So
    PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 1248 - 1257
  • [3] HyperCon: Image-To-Video Model Transfer for Video-To-Video Translation Tasks
    Szeto, Ryan
    El-Khamy, Mostafa
    Lee, Jungwon
    Corso, Jason J.
    2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021, 2021, : 3079 - 3088
  • [4] Mocycle-GAN: Unpaired Video-to-Video Translation
    Chen, Yang
    Pan, Yingwei
    Yao, Ting
    Tian, Xinmei
    Mei, Tao
    PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 647 - 655
  • [5] Application of Video-to-Video Translation Networks to Computational Fluid Dynamics
    Kigure, Hiromitsu
    FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2021, 4
  • [6] Unsupervised video-to-video translation with preservation of frame modification tendency
    Liu, Huajun
    Li, Chao
    Lei, Dian
    Zhu, Qing
    VISUAL COMPUTER, 2020, 36 (10-12): : 2105 - 2116
  • [7] Unsupervised video-to-video translation with preservation of frame modification tendency
    Huajun Liu
    Chao Li
    Dian Lei
    Qing Zhu
    The Visual Computer, 2020, 36 : 2105 - 2116
  • [8] Polygon generation and video-to-video translation for time-series prediction
    Mohamed Elhefnawy
    Ahmed Ragab
    Mohamed-Salah Ouali
    Journal of Intelligent Manufacturing, 2023, 34 : 261 - 279
  • [9] Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation
    Yang, Shuai
    Zhou, Yifan
    Liu, Ziwei
    Loy, Chen Change
    PROCEEDINGS OF THE SIGGRAPH ASIA 2023 CONFERENCE PAPERS, 2023,
  • [10] Polygon generation and video-to-video translation for time-series prediction
    Elhefnawy, Mohamed
    Ragab, Ahmed
    Ouali, Mohamed-Salah
    JOURNAL OF INTELLIGENT MANUFACTURING, 2023, 34 (01) : 261 - 279