VidToMe: Video Token Merging for Zero-Shot Video Editing

被引:0
|
作者
Li, Xirui [1 ]
Ma, Chao [1 ]
Yang, Xiaokang [1 ]
Yang, Ming-Hsuan [2 ]
机构
[1] Shanghai Jiao Tong Univ, AI Inst, MoE Key Lab Artificial Intelligence, Shanghai, Peoples R China
[2] UC Merced, Merced, CA USA
关键词
D O I
10.1109/CVPR52733.2024.00715
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Diffusion models have made significant advances in generating high-quality images, but their application to video generation has remained challenging due to the complexity of temporal motion. Zero-shot video editing offers a solution by utilizing pre-trained image diffusion models to translate source videos into new ones. Nevertheless, existing methods struggle to maintain strict temporal consistency and efficient memory consumption. In this work, we propose a novel approach to enhance temporal consistency in generated videos by merging self-attention tokens across frames. By aligning and compressing temporally redundant tokens across frames, our method improves temporal coherence and reduces memory consumption in self-attention computations. The merging strategy matches and aligns tokens according to the temporal correspondence between frames, facilitating natural temporal consistency in generated video frames. To manage the complexity of video processing, we divide videos into chunks and develop intra-chunk local token merging and inter-chunk global token merging, ensuring both short-term video continuity and long-term content consistency. Our video editing approach seamlessly extends the advancements in image editing to video editing, rendering favorable results in temporal consistency over state-of-the-art methods.
引用
收藏
页码:7486 / 7495
页数:10
相关论文
共 50 条
  • [31] Hierarchical Graph Pattern Understanding for Zero-Shot Video Object Segmentation
    Pei, Gensheng
    Shen, Fumin
    Yao, Yazhou
    Chen, Tao
    Hua, Xian-Sheng
    Shen, Heng-Tao
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 5909 - 5920
  • [32] Generalized Zero-Shot Video Classification via Generative Adversarial Networks
    Hong, Mingyao
    Li, Guorong
    Zhang, Xinfeng
    Huang, Qingming
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 2419 - 2426
  • [33] Semantic matters: A constrained approach for zero-shot video action recognition
    Quan, Zhenzhen
    Chen, Jialei
    Deguchi, Daisuke
    Sun, Jie
    Zhang, Chenkai
    Li, Yujun
    Murase, Hiroshi
    PATTERN RECOGNITION, 2025, 162
  • [34] Zero-Shot Learning on Human-Object Interaction Recognition in video
    Maraghi, Vali Ollah
    Faez, Karim
    2019 5TH IRANIAN CONFERENCE ON SIGNAL PROCESSING AND INTELLIGENT SYSTEMS (ICSPIS 2019), 2019,
  • [35] Zero-Shot and Few-Shot Video Question Answering with Multi-Modal Prompts
    Engin, Deniz
    Avrithis, Yannis
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, : 2796 - 2802
  • [36] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
    Yang, Antoine
    Miech, Antoine
    Sivic, Josef
    Laptev, Ivan
    Schmid, Cordelia
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [37] Zero-shot prompt-based video encoder for surgical gesture recognition
    Rao, Mingxing
    Qin, Yinhong
    Kolouri, Soheil
    Wu, Jie Ying
    Moyer, Daniel
    INTERNATIONAL JOURNAL OF COMPUTER ASSISTED RADIOLOGY AND SURGERY, 2025, 20 (02) : 311 - 321
  • [38] Fine-Grained Feature Generation for Generalized Zero-Shot Video Classification
    Hong, Mingyao
    Zhang, Xinfeng
    Li, Guorong
    Huang, Qingming
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 1599 - 1612
  • [39] Zero-Shot Video Object Segmentation via Attentive Graph Neural Networks
    Wang, Wenguan
    Lu, Xiankai
    Shen, Jianbing
    Crandall, David
    Shao, Ling
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 9235 - 9244
  • [40] Zero-Shot Video Object Segmentation With Co-Attention Siamese Networks
    Lu, Xiankai
    Wang, Wenguan
    Shen, Jianbing
    Crandall, David
    Luo, Jiebo
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (04) : 2228 - 2242