WAVE: Warping DDIM Inversion Features for Zero-Shot Text-to-Video Editing

被引:0
|
作者
Feng, Yutang [1 ,5 ]
Gao, Sicheng [1 ,3 ]
Bao, Yuxiang [1 ]
Wang, Xiaodi [2 ]
Han, Shumin [1 ,2 ]
Zhang, Juan [1 ]
Zhang, Baochang [1 ,4 ]
Yao, Angela [3 ]
机构
[1] Beihang Univ, Beijing, Peoples R China
[2] Baidu VIS, Beijing, Peoples R China
[3] Natl Univ Singapore, Singapore, Singapore
[4] Zhongguancun Lab, Beijing, Peoples R China
[5] Baidu, Beijing, Peoples R China
来源
基金
中国国家自然科学基金; 北京市自然科学基金; 新加坡国家研究基金会;
关键词
Text to video editing; DDIM inversion; Flow-guided warping;
D O I
10.1007/978-3-031-73116-7_3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text-driven video editing has emerged as a prominent application based on the breakthroughs of image diffusion models. Existing state-of-the-art methods focus on zero-shot frameworks due to limited training data and computing resources. To preserve structure consistency, previous frameworks usually employ Denoising Diffusion Implicit Model (DDIM) inversion to provide inverted noise latents as guidance. The key challenge lies in limiting errors caused by the randomness and inaccuracy in each step of the naive DDIM inversion process, which can lead to temporal inconsistency in video editing tasks. Our observation indicates that incorporating temporal keyframe information can alleviate the accumulated error during inversion. In this paper, we propose an effective warping strategy in the feature domain to obtain high-quality DDIM inverted noise latents. Specifically, we shuffle the editing frames randomly in each timestep and use optical flow extracted from the source video to propagate the latent features of the first keyframe to subsequent keyframes. Moreover, we develop a comprehensive zero-shot framework that adapts to this strategy in both the inversion and denoising processes, thereby facilitating the generation of consistent edited videos. We compare our method with state-of-the-art text-driven editing methods on various real-world videos with different forms of motion. The project page is available at https://ree1s.github.io/wave/.
引用
收藏
页码:38 / 55
页数:18
相关论文
共 50 条
  • [41] Text-to-Image Diffusion Models are Zero-Shot Classifiers
    Clark, Kevin
    Jaini, Priyank
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [42] AdaSpeech 4: Adaptive Text to Speech in Zero-Shot Scenarios
    Wu, Yihan
    Tan, Xu
    Li, Bohan
    He, Lei
    Zhao, Sheng
    Song, Ruihua
    Qin, Tao
    Liu, Tie-Yan
    INTERSPEECH 2022, 2022, : 2568 - 2572
  • [43] Issues with Entailment-based Zero-shot Text Classification
    Ma, Tingting
    Yao, Jin-Ge
    Lin, Chin-Yew
    Zhao, Tiejun
    ACL-IJCNLP 2021: THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 2, 2021, : 786 - 796
  • [44] Person Search by Text Attribute Query as Zero-Shot Learning
    Dong, Qi
    Gong, Shaogang
    Zhu, Xiatian
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 3651 - 3660
  • [45] ZeroBERTo: Leveraging Zero-Shot Text Classification by Topic Modeling
    Alcoforado, Alexandre
    Ferraz, Thomas Palmeira
    Gerber, Rodrigo
    Bustos, Enzo
    Oliveira, Andre Seidel
    Veloso, Bruno Miguel
    Siqueira, Fabio Levy
    Reali Costa, Anna Helena
    COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROPOR 2022, 2022, 13208 : 125 - 136
  • [46] Zero-shot detection of LLM-generated text via text reorder
    Sun, Jingtao
    Lv, Zhanglong
    NEUROCOMPUTING, 2025, 631
  • [47] Integrating Semantic Knowledge to Tackle Zero-shot Text Classification
    Zhang, Jingqing
    Lertvittayakumjorn, Piyawat
    Guo, Yike
    2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 1031 - 1040
  • [48] Identifying Entity Properties from Text with Zero-shot Learning
    Imrattanatrai, Wiradee
    Kato, Makoto P.
    Yoshikawa, Masatoshi
    PROCEEDINGS OF THE 42ND INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '19), 2019, : 195 - 204
  • [49] Zero-shot Topical Text Classification with LLMs - an Experimental Study
    Gretz, Shai
    Halfon, Alon
    Shnayderman, Ilya
    Toledo-Ronen, Orith
    Dankin, Lena
    Katsis, Yannis
    Arviv, Ofir
    Katz, Yoav
    Slonim, Noam
    Ein-Dor, Liat
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 9647 - 9676
  • [50] Improved Zero-Shot Classification by Adapting VLMs with Text Descriptions
    Saha, Oindrila
    Van Horn, Grant
    Maji, Subhransu
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 17542 - 17552