FSTT: Flow-Guided Spatial Temporal Transformer for Deep Video Inpainting

被引：5

作者：

Liu, Ruixin ^{[1
]}

Zhu, Yuesheng ^{[1
]}

机构：

[1] Peking Univ, Shenzhen Grad Sch, Commun & Informat Secur Lab, Shenzhen 518055, Peoples R China

来源：

ELECTRONICS | 2023年 / 12卷 / 21期

基金：

中国国家自然科学基金;

关键词：

deep video inpainting; video editing; spatial temporal transformer; optical flow; object removal;

D O I：

10.3390/electronics12214452

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Video inpainting aims to complete the missing regions with content that is consistent both spatially and temporally. How to effectively utilize the spatio-temporal information in videos is critical for video inpainting. Recent advances in video inpainting methods combine both optical flow and transformers to capture spatio-temporal information. However, these methods fail to fully explore the potential of optical flow within the transformer. Furthermore, the designed transformer block cannot effectively integrate spatio-temporal information across frames. To address the above problems, we propose a novel video inpainting model, named Flow-Guided Spatial Temporal Transformer (FSTT), which effectively establishes correspondences between missing regions and valid regions in both spatial and temporal dimensions under the guidance of completed optical flow. Specifically, a Flow-Guided Fusion Feed-Forward module is developed to enhance features with the assistance of optical flow, mitigating the inaccuracies caused by hole pixels when performing MHSA. Additionally, a decomposed spatio-temporal MHSA module is proposed to effectively capture spatio-temporal dependencies in videos. To improve the efficiency of the model, a Global-Local Temporal MHSA module is further designed based on the window partition strategy. Extensive quantitative and qualitative experiments on the DAVIS and YouTube-VOS datasets demonstrate the superiority of our proposed method.

引用

页数：20

共 50 条

[21] Flow-Guided Feature Aggregation for Video Object Detection
Zhu, Xizhou
Wang, Yujie
Dai, Jifeng
Yuan, Lu
Wei, Yichen
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 408 - 417
[22] Fluxformer: Flow-Guided Duplex Attention Transformer via Spatio-Temporal Clustering for Action Recognition
Hong, Younggi
Kim, Min Ju
Lee, Isack
Yoo, Seok Bong
IEEE ROBOTICS AND AUTOMATION LETTERS, 2023, 8 (10) : 6411 - 6418
[23] Spatio-Temporal Inference Transformer Network for Video Inpainting
Tudavekar, Gajanan
Saraf, Santosh S.
Patil, Sanjay R.
INTERNATIONAL JOURNAL OF IMAGE AND GRAPHICS, 2023, 23 (01)
[24] Flow-Guided Diffusion Autoencoder for Unsupervised Video Anomaly Detection
Zhu, Aoni
Wang, Wenjun
Yan, Cheng
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT VI, 2024, 14430 : 183 - 194
[25] FGC-VC: FLOW-GUIDED CONTEXT VIDEO COMPRESSION
Wang, Yiming
Huang, Qian
Tang, Bin
Sun, Huashan
Guo, Xiaotong
2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 3175 - 3179
[26] Recurrent Temporal Aggregation Framework for Deep Video Inpainting
Kim, Dahun
Woo, Sanghyun
Lee, Joon-Young
Kweon, In So
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (05) : 1038 - 1052
[27] Temporal Adaptive Alignment Network for Deep Video Inpainting
Liu, Ruixin
Weng, Zhenyu
Zhu, Yuesheng
Li, Bairong
PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 927 - 933
[28] Optical Flow-Guided Mask Generation Network For Video Segmentation
Li, Yunyi
Chen, Fangping
Yang, Fan
Ma, Cong
Li, Yuan
Jia, Huizhu
Xie, Xiaodong
2020 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2020,
[29] Temporal Group Fusion Network for Deep Video Inpainting
Liu, Ruixin
Li, Bairong
Zhu, Yuesheng
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (06) : 3539 - 3551
[30] Flow-guided Spatial Attention Tracking for Egocentric Activity Recognition
Liu, Tianshan
Lam, Kin-Man
2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 4303 - 4308

← 1 2 3 4 5 →