FSTT: Flow-Guided Spatial Temporal Transformer for Deep Video Inpainting

被引:5
|
作者
Liu, Ruixin [1 ]
Zhu, Yuesheng [1 ]
机构
[1] Peking Univ, Shenzhen Grad Sch, Commun & Informat Secur Lab, Shenzhen 518055, Peoples R China
基金
中国国家自然科学基金;
关键词
deep video inpainting; video editing; spatial temporal transformer; optical flow; object removal;
D O I
10.3390/electronics12214452
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Video inpainting aims to complete the missing regions with content that is consistent both spatially and temporally. How to effectively utilize the spatio-temporal information in videos is critical for video inpainting. Recent advances in video inpainting methods combine both optical flow and transformers to capture spatio-temporal information. However, these methods fail to fully explore the potential of optical flow within the transformer. Furthermore, the designed transformer block cannot effectively integrate spatio-temporal information across frames. To address the above problems, we propose a novel video inpainting model, named Flow-Guided Spatial Temporal Transformer (FSTT), which effectively establishes correspondences between missing regions and valid regions in both spatial and temporal dimensions under the guidance of completed optical flow. Specifically, a Flow-Guided Fusion Feed-Forward module is developed to enhance features with the assistance of optical flow, mitigating the inaccuracies caused by hole pixels when performing MHSA. Additionally, a decomposed spatio-temporal MHSA module is proposed to effectively capture spatio-temporal dependencies in videos. To improve the efficiency of the model, a Global-Local Temporal MHSA module is further designed based on the window partition strategy. Extensive quantitative and qualitative experiments on the DAVIS and YouTube-VOS datasets demonstrate the superiority of our proposed method.
引用
收藏
页数:20
相关论文
共 50 条
  • [31] Deep Transformer Based Video Inpainting Using Fast Fourier Tokenization
    Kim, Taewan
    Kim, Jinwoo
    Oh, Heeseok
    Kang, Jiwoo
    IEEE ACCESS, 2024, 12 : 21723 - 21736
  • [32] Temporal-Spatial Generative Adversarial Networks for Video Inpainting
    Yu B.
    Ding Y.
    Xie Z.
    Huang D.
    Ma L.
    Xie, Zhifeng (zhifeng_xie@shu.edu.cn), 1600, Institute of Computing Technology (32): : 769 - 779
  • [33] Video Inpainting by Jointly Learning Temporal Structure and Spatial Details
    Wang, Chuan
    Huang, Haibin
    Han, Xiaoguang
    Wang, Jue
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 5232 - 5239
  • [34] Flow-Guided Single Object Tracking Framework In UAV Aerial Video
    Zhu, Wenjun
    Yu, Xi
    Meng, Jun
    PROCEEDINGS OF THE 32ND 2020 CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2020), 2020, : 461 - 468
  • [35] Exploiting Optical Flow Guidance for Transformer-Based Video Inpainting
    Zhang, Kaidong
    Peng, Jialun
    Fu, Jingjing
    Liu, Dong
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (07) : 4977 - 4992
  • [36] Video deblurring and flow-guided feature aggregation for obstacle detection in agricultural videos
    Keyang Cheng
    Xuesen Zhu
    Yongzhao Zhan
    Yunshen Pei
    International Journal of Multimedia Information Retrieval, 2022, 11 : 577 - 588
  • [37] Video deblurring and flow-guided feature aggregation for obstacle detection in agricultural videos
    Cheng, Keyang
    Zhu, Xuesen
    Zhan, Yongzhao
    Pei, Yunshen
    INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2022, 11 (04) : 577 - 588
  • [38] SpecReFlow: an algorithm for specular reflection restoration using flow-guided video completion
    Yin, Haoli
    Eimen, Rachel
    Moyer, Daniel
    Bowden, Audrey K.
    JOURNAL OF MEDICAL IMAGING, 2024, 11 (02)
  • [39] Inertia-Guided Flow Completion and Style Fusion for Video Inpainting
    Zhang, Kaidong
    Fu, Jingjing
    Liu, Dong
    Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2022, 2022-June : 5972 - 5981
  • [40] Inertia-Guided Flow Completion and Style Fusion for Video Inpainting
    Zhang, Kaidong
    Fu, Jingjing
    Liu, Dong
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 5972 - 5981