DLFormer: Discrete Latent Transformer for Video Inpainting

被引：20

作者：

Ren, Jingjing ^{[1
,2
]}

Zheng, Qingqing ^{[3
]}

Zhao, Yuanyuan ^{[2
]}

Xu, Xuemiao ^{[1
]}

Li, Chen ^{[2
]}

机构：

[1] South China Univ Technol, Sch Comp Sci & Engn, Guangzhou, Peoples R China

[2] Tencent Inc, WeChat, Shenzhen, Peoples R China

[3] Chinese Acad Sci, Shenzhen Inst Adv Technol, Shenzhen, Peoples R China

来源：

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022) | 2022年

关键词：

D O I：

10.1109/CVPR52688.2022.00350

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Video inpainting remains a challenging problem to fill with plausible and coherent content in unknown areas in video frames despite the prevalence of data-driven methods. Although various transformer-based architectures yield promising result for this task, they still suffer from hallucinating blurry contents and long-term spatial-temporal inconsistency. While noticing the capability of discrete representation for complex reasoning and predictive learning, we propose a novel Discrete Latent Transformer (DLFormer) to reformulate video inpainting tasks into the discrete latent space rather the previous continuous feature space. Specifically, we first learn a unique compact discrete codebook and the corresponding autoencoder to represent the target video. Built upon these representative discrete codes obtained from the entire target video, the subsequent discrete latent transformer is capable to infer proper codes for unknown areas under a self-attention mechanism, and thus produces fine-grained content with long-term spatial-temporal consistency. Moreover, we further explicitly enforce the short-term consistency to relieve temporal visual jitters via a temporal aggregation block among adjacent frames. We conduct comprehensive quantitative and qualitative evaluations to demonstrate that our method significantly outperforms other state-of-the-art approaches in reconstructing visually-plausible and spatial-temporal coherent content with fine-grained details.Code is available at https://github.com/JingjingRenabc/diformer.

引用

页码：3501 / 3510

页数：10

共 50 条

[31] Digital Inpainting and Video Falsifying
Tang, Nick C.
Shih, Timothy K.
INTERNATIONAL SYMPOSIUM OF INFORMATION TECHNOLOGY 2008, VOLS 1-4, PROCEEDINGS: COGNITIVE INFORMATICS: BRIDGING NATURAL AND ARTIFICIAL KNOWLEDGE, 2008, : 18 - 23
[32] Video Inpainting of Complex Scenes
Newson, Alasdair
Almansa, Andres
Fradet, Matthieu
Gousseau, Yann
Perez, Patrick
SIAM JOURNAL ON IMAGING SCIENCES, 2014, 7 (04): : 1993 - 2019
[33] Stereo-video inpainting
Raimbault, Felix
Kokaram, Anil
JOURNAL OF ELECTRONIC IMAGING, 2012, 21 (01)
[34] Deep Stereo Video Inpainting
Wu, Zhiliang
Sun, Changchang
Xuan, Hanyu
Yan, Yan
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 5693 - 5702
[35] Inpainting Algorithm for Video Processing
Chede, Mayuri D.
Metkar, Shilpa P.
AMBIENT COMMUNICATIONS AND COMPUTER SYSTEMS, RACCCS 2017, 2018, 696 : 717 - 728
[36] Method of Fast Video Inpainting
Petrov, Eugeny
Kharina, Natalia
2019 INTERNATIONAL SIBERIAN CONFERENCE ON CONTROL AND COMMUNICATIONS (SIBCON), 2019,
[37] Video Inpainting: A Complete Framework
Zarif, Sameh
Ibrahim, Mina
INTERNATIONAL JOURNAL OF IMAGE AND GRAPHICS, 2021, 21 (03)
[38] Bidirectional interaction of CNN and Transformer for image inpainting
Liu, Jialu
Gong, Maoguo
Gao, Yuan
Lu, Yiheng
Li, Hao
KNOWLEDGE-BASED SYSTEMS, 2024, 299
[39] Image inpainting for periodic discrete density defects via frequency analysis and an adaptive transformer-GAN network
Ding, Hui
Huang, Yuhan
Chen, Nianzhe
Lu, Jiacheng
Li, Shaochun
APPLIED SOFT COMPUTING, 2024, 167
[40] A transformer–CNN for deep image inpainting forensics
Xinshan Zhu
Junyan Lu
Honghao Ren
Hongquan Wang
Biao Sun
The Visual Computer, 2023, 39 : 4721 - 4735

← 1 2 3 4 5 →