VidToMe: Video Token Merging for Zero-Shot Video Editing

被引:0
|
作者
Li, Xirui [1 ]
Ma, Chao [1 ]
Yang, Xiaokang [1 ]
Yang, Ming-Hsuan [2 ]
机构
[1] Shanghai Jiao Tong Univ, AI Inst, MoE Key Lab Artificial Intelligence, Shanghai, Peoples R China
[2] UC Merced, Merced, CA USA
关键词
D O I
10.1109/CVPR52733.2024.00715
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Diffusion models have made significant advances in generating high-quality images, but their application to video generation has remained challenging due to the complexity of temporal motion. Zero-shot video editing offers a solution by utilizing pre-trained image diffusion models to translate source videos into new ones. Nevertheless, existing methods struggle to maintain strict temporal consistency and efficient memory consumption. In this work, we propose a novel approach to enhance temporal consistency in generated videos by merging self-attention tokens across frames. By aligning and compressing temporally redundant tokens across frames, our method improves temporal coherence and reduces memory consumption in self-attention computations. The merging strategy matches and aligns tokens according to the temporal correspondence between frames, facilitating natural temporal consistency in generated video frames. To manage the complexity of video processing, we divide videos into chunks and develop intra-chunk local token merging and inter-chunk global token merging, ensuring both short-term video continuity and long-term content consistency. Our video editing approach seamlessly extends the advancements in image editing to video editing, rendering favorable results in temporal consistency over state-of-the-art methods.
引用
收藏
页码:7486 / 7495
页数:10
相关论文
共 50 条
  • [41] Zero-Shot Video Moment Retrieval Using BLIP-Based Models
    Wattasseril, Jobin Idiculla
    Shekhar, Sumit
    Doellner, Juergen
    Trapp, Matthias
    ADVANCES IN VISUAL COMPUTING, ISVC 2023, PT I, 2023, 14361 : 160 - 171
  • [42] Transferring Knowledge From Text to Video: Zero-Shot Anticipation for Procedural Actions
    Sener, Fadime
    Saraf, Rishabh
    Yao, Angela
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (06) : 7836 - 7852
  • [43] Co-attention Propagation Network for Zero-Shot Video Object Segmentation
    Pei, Gensheng
    Yao, Yazhou
    Shen, Fumin
    Huang, Dan
    Huang, Xingguo
    Shen, Heng-Tao
    arXiv, 2023,
  • [44] ZoLA: Zero-Shot Creative Long Animation Generation with Short Video Model
    Wang, Fu-Yun
    Huang, Zhaoyang
    Ma, Qiang
    Song, Guanglu
    Lu, Xudong
    Bian, Weikang
    Li, Yijin
    Liu, Yu
    Li, Hongsheng
    COMPUTER VISION - ECCV 2024, PT XLV, 2025, 15103 : 329 - 345
  • [45] Concepts Not Alone: Exploring Pairwise Relationships for Zero-Shot Video Activity Recognition
    Gan, Chuang
    Lin, Ming
    Yang, Yi
    de Melo, Gerard
    Hauptmann, Alexander G.
    THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, : 3487 - 3493
  • [46] Adaptive Multi-Source Predictor for Zero-Shot Video Object Segmentation
    Zhao, Xiaoqi
    Chang, Shijie
    Pang, Youwei
    Yang, Jiaxing
    Zhang, Lihe
    Lu, Huchuan
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (08) : 3232 - 3250
  • [47] Alignment-Uniformity aware Representation Learning for Zero-shot Video Classification
    Pu, Shi
    Zhao, Kaili
    Zheng, Mao
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 19936 - 19945
  • [48] VDARN: Video Disentangling Attentive Relation Network for Few-Shot and Zero-Shot Action Recognition
    Su, Yong
    Xing, Meng
    An, Simin
    Peng, Weilong
    Feng, Zhiyong
    AD HOC NETWORKS, 2021, 113
  • [49] HecVL: Hierarchical Video-Language Pretraining for Zero-Shot Surgical Phase Recognition
    Yuan, Kun
    Srivastav, Vinkle
    Navab, Nassir
    Padoy, Nicolas
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT VI, 2024, 15006 : 306 - 316
  • [50] Generalized zero-shot learning for action recognition with web-scale video data
    Kun Liu
    Wu Liu
    Huadong Ma
    Wenbing Huang
    Xiongxiong Dong
    World Wide Web, 2019, 22 : 807 - 824