VidToMe: Video Token Merging for Zero-Shot Video Editing

被引:0
|
作者
Li, Xirui [1 ]
Ma, Chao [1 ]
Yang, Xiaokang [1 ]
Yang, Ming-Hsuan [2 ]
机构
[1] Shanghai Jiao Tong Univ, AI Inst, MoE Key Lab Artificial Intelligence, Shanghai, Peoples R China
[2] UC Merced, Merced, CA USA
关键词
D O I
10.1109/CVPR52733.2024.00715
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Diffusion models have made significant advances in generating high-quality images, but their application to video generation has remained challenging due to the complexity of temporal motion. Zero-shot video editing offers a solution by utilizing pre-trained image diffusion models to translate source videos into new ones. Nevertheless, existing methods struggle to maintain strict temporal consistency and efficient memory consumption. In this work, we propose a novel approach to enhance temporal consistency in generated videos by merging self-attention tokens across frames. By aligning and compressing temporally redundant tokens across frames, our method improves temporal coherence and reduces memory consumption in self-attention computations. The merging strategy matches and aligns tokens according to the temporal correspondence between frames, facilitating natural temporal consistency in generated video frames. To manage the complexity of video processing, we divide videos into chunks and develop intra-chunk local token merging and inter-chunk global token merging, ensuring both short-term video continuity and long-term content consistency. Our video editing approach seamlessly extends the advancements in image editing to video editing, rendering favorable results in temporal consistency over state-of-the-art methods.
引用
收藏
页码:7486 / 7495
页数:10
相关论文
共 50 条
  • [1] FateZero: Fusing Attentions for Zero-shot Text-based Video Editing
    Qi, Chenyang
    Cun, Xiaodong
    Zhang, Yong
    Lei, Chenyang
    Wang, Xintao
    Shan, Ying
    Chen, Qifeng
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 15886 - 15896
  • [2] Zero-shot Natural Language Video Localization
    Nam, Jinwoo
    Ahn, Daechul
    Kang, Dongyeop
    Ha, Seong Jong
    Choi, Jonghyun
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 1450 - 1459
  • [3] WAVE: Warping DDIM Inversion Features for Zero-Shot Text-to-Video Editing
    Feng, Yutang
    Gao, Sicheng
    Bao, Yuxiang
    Wang, Xiaodi
    Han, Shumin
    Zhang, Juan
    Zhang, Baochang
    Yao, Angela
    COMPUTER VISION - ECCV 2024, PT LXXVI, 2025, 15134 : 38 - 55
  • [4] Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation
    Yang, Shuai
    Zhou, Yifan
    Liu, Ziwei
    Loy, Chen Change
    PROCEEDINGS OF THE SIGGRAPH ASIA 2023 CONFERENCE PAPERS, 2023,
  • [5] A Video is Worth 256 Bases: Spatial-Temporal Expectation-Maximization Inversion for Zero-Shot Video Editing
    Li, Maomao
    Li, Yu
    Yang, Tianyu
    Liu, Yunfei
    Yue, Dongxu
    Lin, Zhihui
    Xu, Dong
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 7528 - 7537
  • [6] Zero-Shot Video Grounding for Automatic Video Understanding in Sustainable Smart Cities
    Wang, Ping
    Sun, Li
    Wang, Liuan
    Sun, Jun
    SUSTAINABILITY, 2023, 15 (01)
  • [7] Video Attribute Prototype Network: A New Perspective for Zero-Shot Video Classification
    Wang, Bo
    Zhao, Kaili
    Zhao, Hongyang
    Pu, Shi
    Xiao, Bo
    Guo, Jun
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, : 315 - 324
  • [8] Zero-Shot Video Retrieval Using Content and Concepts
    Dalton, Jeffrey
    Allan, James
    Mirajkar, Pranav
    PROCEEDINGS OF THE 22ND ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM'13), 2013, : 1857 - 1860
  • [9] Latent Concept Extraction for Zero-shot Video Retrieval
    Ueki, Kazuya
    2018 INTERNATIONAL CONFERENCE ON IMAGE AND VISION COMPUTING NEW ZEALAND (IVCNZ), 2018,
  • [10] Orthogonal Temporal Interpolation for Zero-Shot Video Recognition
    Zhu, Yan
    Zhuo, Junbao
    Ma, Bin
    Geng, Jiajia
    Wei, Xiaoming
    Wei, Xiaolin
    Wang, Shuhui
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 7491 - 7501