DLFormer: Discrete Latent Transformer for Video Inpainting

被引:20
|
作者
Ren, Jingjing [1 ,2 ]
Zheng, Qingqing [3 ]
Zhao, Yuanyuan [2 ]
Xu, Xuemiao [1 ]
Li, Chen [2 ]
机构
[1] South China Univ Technol, Sch Comp Sci & Engn, Guangzhou, Peoples R China
[2] Tencent Inc, WeChat, Shenzhen, Peoples R China
[3] Chinese Acad Sci, Shenzhen Inst Adv Technol, Shenzhen, Peoples R China
来源
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022) | 2022年
关键词
D O I
10.1109/CVPR52688.2022.00350
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video inpainting remains a challenging problem to fill with plausible and coherent content in unknown areas in video frames despite the prevalence of data-driven methods. Although various transformer-based architectures yield promising result for this task, they still suffer from hallucinating blurry contents and long-term spatial-temporal inconsistency. While noticing the capability of discrete representation for complex reasoning and predictive learning, we propose a novel Discrete Latent Transformer (DLFormer) to reformulate video inpainting tasks into the discrete latent space rather the previous continuous feature space. Specifically, we first learn a unique compact discrete codebook and the corresponding autoencoder to represent the target video. Built upon these representative discrete codes obtained from the entire target video, the subsequent discrete latent transformer is capable to infer proper codes for unknown areas under a self-attention mechanism, and thus produces fine-grained content with long-term spatial-temporal consistency. Moreover, we further explicitly enforce the short-term consistency to relieve temporal visual jitters via a temporal aggregation block among adjacent frames. We conduct comprehensive quantitative and qualitative evaluations to demonstrate that our method significantly outperforms other state-of-the-art approaches in reconstructing visually-plausible and spatial-temporal coherent content with fine-grained details.Code is available at https://github.com/JingjingRenabc/diformer.
引用
收藏
页码:3501 / 3510
页数:10
相关论文
共 50 条
  • [31] Digital Inpainting and Video Falsifying
    Tang, Nick C.
    Shih, Timothy K.
    INTERNATIONAL SYMPOSIUM OF INFORMATION TECHNOLOGY 2008, VOLS 1-4, PROCEEDINGS: COGNITIVE INFORMATICS: BRIDGING NATURAL AND ARTIFICIAL KNOWLEDGE, 2008, : 18 - 23
  • [32] Video Inpainting of Complex Scenes
    Newson, Alasdair
    Almansa, Andres
    Fradet, Matthieu
    Gousseau, Yann
    Perez, Patrick
    SIAM JOURNAL ON IMAGING SCIENCES, 2014, 7 (04): : 1993 - 2019
  • [33] Stereo-video inpainting
    Raimbault, Felix
    Kokaram, Anil
    JOURNAL OF ELECTRONIC IMAGING, 2012, 21 (01)
  • [34] Deep Stereo Video Inpainting
    Wu, Zhiliang
    Sun, Changchang
    Xuan, Hanyu
    Yan, Yan
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 5693 - 5702
  • [35] Inpainting Algorithm for Video Processing
    Chede, Mayuri D.
    Metkar, Shilpa P.
    AMBIENT COMMUNICATIONS AND COMPUTER SYSTEMS, RACCCS 2017, 2018, 696 : 717 - 728
  • [36] Method of Fast Video Inpainting
    Petrov, Eugeny
    Kharina, Natalia
    2019 INTERNATIONAL SIBERIAN CONFERENCE ON CONTROL AND COMMUNICATIONS (SIBCON), 2019,
  • [37] Video Inpainting: A Complete Framework
    Zarif, Sameh
    Ibrahim, Mina
    INTERNATIONAL JOURNAL OF IMAGE AND GRAPHICS, 2021, 21 (03)
  • [38] Bidirectional interaction of CNN and Transformer for image inpainting
    Liu, Jialu
    Gong, Maoguo
    Gao, Yuan
    Lu, Yiheng
    Li, Hao
    KNOWLEDGE-BASED SYSTEMS, 2024, 299
  • [39] Image inpainting for periodic discrete density defects via frequency analysis and an adaptive transformer-GAN network
    Ding, Hui
    Huang, Yuhan
    Chen, Nianzhe
    Lu, Jiacheng
    Li, Shaochun
    APPLIED SOFT COMPUTING, 2024, 167
  • [40] A transformer–CNN for deep image inpainting forensics
    Xinshan Zhu
    Junyan Lu
    Honghao Ren
    Hongquan Wang
    Biao Sun
    The Visual Computer, 2023, 39 : 4721 - 4735