DLFormer: Discrete Latent Transformer for Video Inpainting

被引:20
|
作者
Ren, Jingjing [1 ,2 ]
Zheng, Qingqing [3 ]
Zhao, Yuanyuan [2 ]
Xu, Xuemiao [1 ]
Li, Chen [2 ]
机构
[1] South China Univ Technol, Sch Comp Sci & Engn, Guangzhou, Peoples R China
[2] Tencent Inc, WeChat, Shenzhen, Peoples R China
[3] Chinese Acad Sci, Shenzhen Inst Adv Technol, Shenzhen, Peoples R China
关键词
D O I
10.1109/CVPR52688.2022.00350
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video inpainting remains a challenging problem to fill with plausible and coherent content in unknown areas in video frames despite the prevalence of data-driven methods. Although various transformer-based architectures yield promising result for this task, they still suffer from hallucinating blurry contents and long-term spatial-temporal inconsistency. While noticing the capability of discrete representation for complex reasoning and predictive learning, we propose a novel Discrete Latent Transformer (DLFormer) to reformulate video inpainting tasks into the discrete latent space rather the previous continuous feature space. Specifically, we first learn a unique compact discrete codebook and the corresponding autoencoder to represent the target video. Built upon these representative discrete codes obtained from the entire target video, the subsequent discrete latent transformer is capable to infer proper codes for unknown areas under a self-attention mechanism, and thus produces fine-grained content with long-term spatial-temporal consistency. Moreover, we further explicitly enforce the short-term consistency to relieve temporal visual jitters via a temporal aggregation block among adjacent frames. We conduct comprehensive quantitative and qualitative evaluations to demonstrate that our method significantly outperforms other state-of-the-art approaches in reconstructing visually-plausible and spatial-temporal coherent content with fine-grained details.Code is available at https://github.com/JingjingRenabc/diformer.
引用
收藏
页码:3501 / 3510
页数:10
相关论文
共 50 条
  • [1] DLFormer: Discrete Latent Transformer for Video Inpainting
    School of Computer Science and Engineering, South China University of Technology, China
    不详
    不详
    Proc IEEE Comput Soc Conf Comput Vision Pattern Recognit, 1600, (3501-3510):
  • [2] Feature pre-inpainting enhanced transformer for video inpainting
    Li, Guanxiao
    Zhang, Ke
    Su, Yu
    Wang, Jingyu
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 123
  • [3] Latent Video Transformer
    Rakhimov, Ruslan
    Volkhonskiy, Denis
    Artemov, Alexey
    Zorin, Denis
    Burnaev, Evgeny
    VISAPP: PROCEEDINGS OF THE 16TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS - VOL. 5: VISAPP, 2021, : 101 - 112
  • [4] ProPainter: Improving Propagation and Transformer for Video Inpainting
    Zhou, Shangchen
    Li, Chongyi
    Chan, Kelvin C. K.
    Loy, Chen Change
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 10443 - 10452
  • [5] Flow-Guided Transformer for Video Inpainting
    Zhang, Kaidong
    Fu, Jingjing
    Liu, Dong
    COMPUTER VISION - ECCV 2022, PT XVIII, 2022, 13678 : 74 - 90
  • [6] Discrete codebook collaborating with transformer for thangka image inpainting
    Bai, Jinxian
    Fan, Yao
    Zhao, Zhiwei
    MULTIMEDIA SYSTEMS, 2024, 30 (05)
  • [7] Semantic-Aware Dynamic Parameter for Video Inpainting Transformer
    Lee, Eunhye
    Yoo, Jinsu
    Yang, Yunjeong
    Baik, Sungyong
    Kim, Tae Hyun
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 12903 - 12912
  • [8] WTVI: A Wavelet-Based Transformer Network for Video Inpainting
    Zhang, Ke
    Li, Guanxiao
    Su, Yu
    Wang, Jingyu
    IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 616 - 620
  • [9] WaveFormer: Wavelet Transformer for Noise-Robust Video Inpainting
    Wu, Zhiliang
    Sun, Changchang
    Xuan, Hanyu
    Liu, Gaowen
    Yan, Yan
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 6, 2024, : 6180 - 6188
  • [10] Spatio-Temporal Inference Transformer Network for Video Inpainting
    Tudavekar, Gajanan
    Saraf, Santosh S.
    Patil, Sanjay R.
    INTERNATIONAL JOURNAL OF IMAGE AND GRAPHICS, 2023, 23 (01)