DLFormer: Discrete Latent Transformer for Video Inpainting

被引：20

作者：

Ren, Jingjing ^{[1
,2
]}

Zheng, Qingqing ^{[3
]}

Zhao, Yuanyuan ^{[2
]}

Xu, Xuemiao ^{[1
]}

Li, Chen ^{[2
]}

机构：

[1] South China Univ Technol, Sch Comp Sci & Engn, Guangzhou, Peoples R China

[2] Tencent Inc, WeChat, Shenzhen, Peoples R China

[3] Chinese Acad Sci, Shenzhen Inst Adv Technol, Shenzhen, Peoples R China

来源：

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022) | 2022年

关键词：

D O I：

10.1109/CVPR52688.2022.00350

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Video inpainting remains a challenging problem to fill with plausible and coherent content in unknown areas in video frames despite the prevalence of data-driven methods. Although various transformer-based architectures yield promising result for this task, they still suffer from hallucinating blurry contents and long-term spatial-temporal inconsistency. While noticing the capability of discrete representation for complex reasoning and predictive learning, we propose a novel Discrete Latent Transformer (DLFormer) to reformulate video inpainting tasks into the discrete latent space rather the previous continuous feature space. Specifically, we first learn a unique compact discrete codebook and the corresponding autoencoder to represent the target video. Built upon these representative discrete codes obtained from the entire target video, the subsequent discrete latent transformer is capable to infer proper codes for unknown areas under a self-attention mechanism, and thus produces fine-grained content with long-term spatial-temporal consistency. Moreover, we further explicitly enforce the short-term consistency to relieve temporal visual jitters via a temporal aggregation block among adjacent frames. We conduct comprehensive quantitative and qualitative evaluations to demonstrate that our method significantly outperforms other state-of-the-art approaches in reconstructing visually-plausible and spatial-temporal coherent content with fine-grained details.Code is available at https://github.com/JingjingRenabc/diformer.

引用

页码：3501 / 3510

页数：10

共 50 条

[1] DLFormer: Discrete Latent Transformer for Video Inpainting
School of Computer Science and Engineering, South China University of Technology, China
不详
不详
Proc IEEE Comput Soc Conf Comput Vision Pattern Recognit, 1600, (3501-3510):
[2] Feature pre-inpainting enhanced transformer for video inpainting
Li, Guanxiao
Zhang, Ke
Su, Yu
Wang, Jingyu
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 123
[3] Latent Video Transformer
Rakhimov, Ruslan
Volkhonskiy, Denis
Artemov, Alexey
Zorin, Denis
Burnaev, Evgeny
VISAPP: PROCEEDINGS OF THE 16TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS - VOL. 5: VISAPP, 2021, : 101 - 112
[4] ProPainter: Improving Propagation and Transformer for Video Inpainting
Zhou, Shangchen
Li, Chongyi
Chan, Kelvin C. K.
Loy, Chen Change
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 10443 - 10452
[5] Flow-Guided Transformer for Video Inpainting
Zhang, Kaidong
Fu, Jingjing
Liu, Dong
COMPUTER VISION - ECCV 2022, PT XVIII, 2022, 13678 : 74 - 90
[6] Discrete codebook collaborating with transformer for thangka image inpainting
Bai, Jinxian
Fan, Yao
Zhao, Zhiwei
MULTIMEDIA SYSTEMS, 2024, 30 (05)
[7] Semantic-Aware Dynamic Parameter for Video Inpainting Transformer
Lee, Eunhye
Yoo, Jinsu
Yang, Yunjeong
Baik, Sungyong
Kim, Tae Hyun
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 12903 - 12912
[8] WTVI: A Wavelet-Based Transformer Network for Video Inpainting
Zhang, Ke
Li, Guanxiao
Su, Yu
Wang, Jingyu
IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 616 - 620
[9] WaveFormer: Wavelet Transformer for Noise-Robust Video Inpainting
Wu, Zhiliang
Sun, Changchang
Xuan, Hanyu
Liu, Gaowen
Yan, Yan
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 6, 2024, : 6180 - 6188
[10] Spatio-Temporal Inference Transformer Network for Video Inpainting
Tudavekar, Gajanan
Saraf, Santosh S.
Patil, Sanjay R.
INTERNATIONAL JOURNAL OF IMAGE AND GRAPHICS, 2023, 23 (01)

← 1 2 3 4 5 →