FSTT: Flow-Guided Spatial Temporal Transformer for Deep Video Inpainting

被引：5

作者：

Liu, Ruixin ^{[1
]}

Zhu, Yuesheng ^{[1
]}

机构：

[1] Peking Univ, Shenzhen Grad Sch, Commun & Informat Secur Lab, Shenzhen 518055, Peoples R China

来源：

ELECTRONICS | 2023年 / 12卷 / 21期

基金：

中国国家自然科学基金;

关键词：

deep video inpainting; video editing; spatial temporal transformer; optical flow; object removal;

D O I：

10.3390/electronics12214452

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Video inpainting aims to complete the missing regions with content that is consistent both spatially and temporally. How to effectively utilize the spatio-temporal information in videos is critical for video inpainting. Recent advances in video inpainting methods combine both optical flow and transformers to capture spatio-temporal information. However, these methods fail to fully explore the potential of optical flow within the transformer. Furthermore, the designed transformer block cannot effectively integrate spatio-temporal information across frames. To address the above problems, we propose a novel video inpainting model, named Flow-Guided Spatial Temporal Transformer (FSTT), which effectively establishes correspondences between missing regions and valid regions in both spatial and temporal dimensions under the guidance of completed optical flow. Specifically, a Flow-Guided Fusion Feed-Forward module is developed to enhance features with the assistance of optical flow, mitigating the inaccuracies caused by hole pixels when performing MHSA. Additionally, a decomposed spatio-temporal MHSA module is proposed to effectively capture spatio-temporal dependencies in videos. To improve the efficiency of the model, a Global-Local Temporal MHSA module is further designed based on the window partition strategy. Extensive quantitative and qualitative experiments on the DAVIS and YouTube-VOS datasets demonstrate the superiority of our proposed method.

引用

页数：20

共 50 条

[41] DEEP VIDEO INPAINTING GUIDED BY AUDIO-VISUAL SELF-SUPERVISION
Kim, Kyuyeon
Jung, Junsik
Kim, Woo Jae
Yoon, Sung-Eui
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 1970 - 1974
[42] FLOW-GUIDED DEFORMABLE ATTENTION NETWORK FOR FAST ONLINE VIDEO SUPER-RESOLUTION
Yang, Xi
Zhang, Xindong
Zhang, Lei
2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 390 - 394
[43] BSRT: Improving Burst Super-Resolution with Swin Transformer and Flow-Guided Deformable Alignment
Luo, Ziwei
Li, Youwei
Cheng, Shen
Yu, Lei
Wu, Qi
Wen, Zhihong
Fan, Haoqiang
Sun, Jian
Liu, Shuaicheng
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 997 - 1007
[44] High resolution video inpainting based on spatial structure and temporal edge information
Bo, Dezhi
Ma, Ran
Wang, Keke
Li, Kai
An, Ping
OPTOELECTRONIC IMAGING AND MULTIMEDIA TECHNOLOGY VII, 2020, 11550
[45] Spatial-Temporal Transformer for Video Snapshot Compressive Imaging
Wang, Lishun
Cao, Miao
Zhong, Yong
Yuan, Xin
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (07) : 9072 - 9089
[46] ShiftFormer: Spatial-Temporal Shift Operation in Video Transformer
Yang, Beiying
Zhu, Guibo
Ge, Guojing
Luo, Jinzhao
Wang, Jinqiao
2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 1895 - 1900
[47] Attention-guided video super-resolution with recurrent multi-scale spatial–temporal transformer
Wei Sun
Xianguang Kong
Yanning Zhang
Complex & Intelligent Systems, 2023, 9 : 3989 - 4002
[48] Aggregating multi-scale flow-enhanced information in transformer for video inpainting
Li, Guanxiao
Zhang, Ke
Su, Yu
Wang, Jingyu
MULTIMEDIA SYSTEMS, 2025, 31 (01)
[49] Local flow propagation and global multi-scale dilated Transformer for video inpainting
Zuo, Yuting
Chen, Jing
Wang, Kaixing
Lin, Qi
Zeng, Huanqiang
JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2025, 107
[50] Optical Flow-Guided Deep Convolutional Neural Networks for UAV Detection in Infrared Videos
Yang, Xin
Wang, Yi-zheng
Wang, Gang
2022 IEEE 7TH INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION ENGINEERING, ICITE, 2022, : 457 - 461

← 1 2 3 4 5 →