WAVE: Warping DDIM Inversion Features for Zero-Shot Text-to-Video Editing

被引:0
|
作者
Feng, Yutang [1 ,5 ]
Gao, Sicheng [1 ,3 ]
Bao, Yuxiang [1 ]
Wang, Xiaodi [2 ]
Han, Shumin [1 ,2 ]
Zhang, Juan [1 ]
Zhang, Baochang [1 ,4 ]
Yao, Angela [3 ]
机构
[1] Beihang Univ, Beijing, Peoples R China
[2] Baidu VIS, Beijing, Peoples R China
[3] Natl Univ Singapore, Singapore, Singapore
[4] Zhongguancun Lab, Beijing, Peoples R China
[5] Baidu, Beijing, Peoples R China
来源
基金
中国国家自然科学基金; 北京市自然科学基金; 新加坡国家研究基金会;
关键词
Text to video editing; DDIM inversion; Flow-guided warping;
D O I
10.1007/978-3-031-73116-7_3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text-driven video editing has emerged as a prominent application based on the breakthroughs of image diffusion models. Existing state-of-the-art methods focus on zero-shot frameworks due to limited training data and computing resources. To preserve structure consistency, previous frameworks usually employ Denoising Diffusion Implicit Model (DDIM) inversion to provide inverted noise latents as guidance. The key challenge lies in limiting errors caused by the randomness and inaccuracy in each step of the naive DDIM inversion process, which can lead to temporal inconsistency in video editing tasks. Our observation indicates that incorporating temporal keyframe information can alleviate the accumulated error during inversion. In this paper, we propose an effective warping strategy in the feature domain to obtain high-quality DDIM inverted noise latents. Specifically, we shuffle the editing frames randomly in each timestep and use optical flow extracted from the source video to propagate the latent features of the first keyframe to subsequent keyframes. Moreover, we develop a comprehensive zero-shot framework that adapts to this strategy in both the inversion and denoising processes, thereby facilitating the generation of consistent edited videos. We compare our method with state-of-the-art text-driven editing methods on various real-world videos with different forms of motion. The project page is available at https://ree1s.github.io/wave/.
引用
收藏
页码:38 / 55
页数:18
相关论文
共 50 条
  • [31] DreamMotion: Space-Time Self-similar Score Distillation for Zero-Shot Video Editing
    Jeong, Hyeonho
    Chang, Jinho
    Park, Geon Yeong
    Ye, Jong Chul
    COMPUTER VISION - ECCV 2024, PT XXX, 2025, 15088 : 358 - 376
  • [32] Latent Concept Extraction for Zero-shot Video Retrieval
    Ueki, Kazuya
    2018 INTERNATIONAL CONFERENCE ON IMAGE AND VISION COMPUTING NEW ZEALAND (IVCNZ), 2018,
  • [33] Orthogonal Temporal Interpolation for Zero-Shot Video Recognition
    Zhu, Yan
    Zhuo, Junbao
    Ma, Bin
    Geng, Jiajia
    Wei, Xiaoming
    Wei, Xiaolin
    Wang, Shuhui
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 7491 - 7501
  • [34] Learning to Model Relationships for Zero-Shot Video Classification
    Gao, Junyu
    Zhang, Tianzhu
    Xu, Changsheng
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (10) : 3476 - 3491
  • [35] Space-Time Diffusion Features for Zero-Shot Text-Driven Motion Transfer
    Yatim, Danah
    Fridman, Rafail
    Bar-Tal, Omer
    Kasten, Yoni
    Dekel, Tali
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 8466 - 8476
  • [36] Discriminative Learning of Latent Features for Zero-Shot Recognition
    Li, Yan
    Zhang, Junge
    Zhang, Jianguo
    Huang, Kaiqi
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 7463 - 7471
  • [37] Ranking Synthetic Features for Generative Zero-Shot Learning
    Ramazi, Shayan
    Nadian-Ghomsheh, Ali
    2021 26TH INTERNATIONAL COMPUTER CONFERENCE, COMPUTER SOCIETY OF IRAN (CSICC), 2021,
  • [38] Customize-A-Video: One-Shot Motion Customization of Text-to-Video Diffusion Models
    Ren, Yixuan
    Zhou, Yang
    Yang, Jimei
    Shi, Jing
    Liu, Difan
    Liu, Feng
    Kwon, Mingi
    Shrivastava, Abhinav
    COMPUTER VISION - ECCV 2024, PT LXXXIX, 2025, 15147 : 332 - 349
  • [39] Dual Projective Zero-Shot Learning Using Text Descriptions
    Rao, Yunbo
    Yang, Ziqiang
    Zeng, Shaoning
    Wang, Qifeng
    Pu, Jiansu
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (01)
  • [40] On the Zero-Shot Generalization of Machine-Generated Text Detectors
    Pu, Xiao
    Zhang, Jingyu
    Han, Xiaochuang
    Tsvetkov, Yulia
    He, Tianxing
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 4799 - 4808