DLFormer: Discrete Latent Transformer for Video Inpainting

被引:20
|
作者
Ren, Jingjing [1 ,2 ]
Zheng, Qingqing [3 ]
Zhao, Yuanyuan [2 ]
Xu, Xuemiao [1 ]
Li, Chen [2 ]
机构
[1] South China Univ Technol, Sch Comp Sci & Engn, Guangzhou, Peoples R China
[2] Tencent Inc, WeChat, Shenzhen, Peoples R China
[3] Chinese Acad Sci, Shenzhen Inst Adv Technol, Shenzhen, Peoples R China
关键词
D O I
10.1109/CVPR52688.2022.00350
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video inpainting remains a challenging problem to fill with plausible and coherent content in unknown areas in video frames despite the prevalence of data-driven methods. Although various transformer-based architectures yield promising result for this task, they still suffer from hallucinating blurry contents and long-term spatial-temporal inconsistency. While noticing the capability of discrete representation for complex reasoning and predictive learning, we propose a novel Discrete Latent Transformer (DLFormer) to reformulate video inpainting tasks into the discrete latent space rather the previous continuous feature space. Specifically, we first learn a unique compact discrete codebook and the corresponding autoencoder to represent the target video. Built upon these representative discrete codes obtained from the entire target video, the subsequent discrete latent transformer is capable to infer proper codes for unknown areas under a self-attention mechanism, and thus produces fine-grained content with long-term spatial-temporal consistency. Moreover, we further explicitly enforce the short-term consistency to relieve temporal visual jitters via a temporal aggregation block among adjacent frames. We conduct comprehensive quantitative and qualitative evaluations to demonstrate that our method significantly outperforms other state-of-the-art approaches in reconstructing visually-plausible and spatial-temporal coherent content with fine-grained details.Code is available at https://github.com/JingjingRenabc/diformer.
引用
收藏
页码:3501 / 3510
页数:10
相关论文
共 50 条
  • [41] IMAGE INPAINTING BY MSCSWIN TRANSFORMER ADVERSARIAL AUTOENCODER
    Chen, Bo-Wei
    Liu, Tsung-Jung
    Liu, Kuan-Hsien
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 2040 - 2044
  • [42] TSFormer: Tracking Structure Transformer for Image Inpainting
    Lin, Jiayu
    Wang, Yuan-gen
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (12)
  • [43] Edge-Guided Image Inpainting with Transformer
    Liang, Huining
    Kambhamettu, Chandra
    ADVANCES IN VISUAL COMPUTING, ISVC 2023, PT II, 2023, 14362 : 285 - 296
  • [44] Learning Contextual Transformer Network for Image Inpainting
    Deng, Ye
    Hui, Siqi
    Zhou, Sanping
    Meng, Deyu
    Wang, Jinjun
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 2529 - 2538
  • [45] Application of Inpainting Technology to Video Restoration
    Chang, Rong-Chi
    Tang, Nick C.
    Chao, Chia Cheng
    2008 FIRST IEEE INTERNATIONAL CONFERENCE ON UBI-MEDIA COMPUTING AND WORKSHOPS, PROCEEDINGS, 2008, : 359 - 364
  • [46] Character Superimposition Inpainting in Surveillance Video
    Jia, Lili
    Tao, Junjie
    You, Ying
    INTERNATIONAL CONFERENCE ON OPTOELECTRONICS AND MICROELECTRONICS TECHNOLOGY AND APPLICATION, 2017, 10244
  • [47] Properties of a Variational Model for Video Inpainting
    March, Riccardo
    Riey, Giuseppe
    NETWORKS & SPATIAL ECONOMICS, 2022, 22 (02): : 315 - 326
  • [48] Video Inpainting Localization With Contrastive Learning
    Lou, Zijie
    Cao, Gang
    Lin, Man
    IEEE SIGNAL PROCESSING LETTERS, 2025, 32 : 611 - 615
  • [49] MOTION-CONSISTENT VIDEO INPAINTING
    Thuc Trinh Le
    Almansa, Andres
    Gousseau, Yann
    Masnou, Simon
    2017 24TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2017, : 2094 - 2098
  • [50] Video Editing Using Motion Inpainting
    Tsai, Joseph C.
    Shih, Timothy K.
    Wattanachote, Kanoksak
    Li, Kuan-Ching
    2012 IEEE 26TH INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS (AINA), 2012, : 649 - 654