FSTT: Flow-Guided Spatial Temporal Transformer for Deep Video Inpainting

被引:5
|
作者
Liu, Ruixin [1 ]
Zhu, Yuesheng [1 ]
机构
[1] Peking Univ, Shenzhen Grad Sch, Commun & Informat Secur Lab, Shenzhen 518055, Peoples R China
基金
中国国家自然科学基金;
关键词
deep video inpainting; video editing; spatial temporal transformer; optical flow; object removal;
D O I
10.3390/electronics12214452
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Video inpainting aims to complete the missing regions with content that is consistent both spatially and temporally. How to effectively utilize the spatio-temporal information in videos is critical for video inpainting. Recent advances in video inpainting methods combine both optical flow and transformers to capture spatio-temporal information. However, these methods fail to fully explore the potential of optical flow within the transformer. Furthermore, the designed transformer block cannot effectively integrate spatio-temporal information across frames. To address the above problems, we propose a novel video inpainting model, named Flow-Guided Spatial Temporal Transformer (FSTT), which effectively establishes correspondences between missing regions and valid regions in both spatial and temporal dimensions under the guidance of completed optical flow. Specifically, a Flow-Guided Fusion Feed-Forward module is developed to enhance features with the assistance of optical flow, mitigating the inaccuracies caused by hole pixels when performing MHSA. Additionally, a decomposed spatio-temporal MHSA module is proposed to effectively capture spatio-temporal dependencies in videos. To improve the efficiency of the model, a Global-Local Temporal MHSA module is further designed based on the window partition strategy. Extensive quantitative and qualitative experiments on the DAVIS and YouTube-VOS datasets demonstrate the superiority of our proposed method.
引用
收藏
页数:20
相关论文
共 50 条
  • [21] Flow-Guided Feature Aggregation for Video Object Detection
    Zhu, Xizhou
    Wang, Yujie
    Dai, Jifeng
    Yuan, Lu
    Wei, Yichen
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 408 - 417
  • [22] Fluxformer: Flow-Guided Duplex Attention Transformer via Spatio-Temporal Clustering for Action Recognition
    Hong, Younggi
    Kim, Min Ju
    Lee, Isack
    Yoo, Seok Bong
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2023, 8 (10) : 6411 - 6418
  • [23] Spatio-Temporal Inference Transformer Network for Video Inpainting
    Tudavekar, Gajanan
    Saraf, Santosh S.
    Patil, Sanjay R.
    INTERNATIONAL JOURNAL OF IMAGE AND GRAPHICS, 2023, 23 (01)
  • [24] Flow-Guided Diffusion Autoencoder for Unsupervised Video Anomaly Detection
    Zhu, Aoni
    Wang, Wenjun
    Yan, Cheng
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT VI, 2024, 14430 : 183 - 194
  • [25] FGC-VC: FLOW-GUIDED CONTEXT VIDEO COMPRESSION
    Wang, Yiming
    Huang, Qian
    Tang, Bin
    Sun, Huashan
    Guo, Xiaotong
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 3175 - 3179
  • [26] Recurrent Temporal Aggregation Framework for Deep Video Inpainting
    Kim, Dahun
    Woo, Sanghyun
    Lee, Joon-Young
    Kweon, In So
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (05) : 1038 - 1052
  • [27] Temporal Adaptive Alignment Network for Deep Video Inpainting
    Liu, Ruixin
    Weng, Zhenyu
    Zhu, Yuesheng
    Li, Bairong
    PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 927 - 933
  • [28] Optical Flow-Guided Mask Generation Network For Video Segmentation
    Li, Yunyi
    Chen, Fangping
    Yang, Fan
    Ma, Cong
    Li, Yuan
    Jia, Huizhu
    Xie, Xiaodong
    2020 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2020,
  • [29] Temporal Group Fusion Network for Deep Video Inpainting
    Liu, Ruixin
    Li, Bairong
    Zhu, Yuesheng
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (06) : 3539 - 3551
  • [30] Flow-guided Spatial Attention Tracking for Egocentric Activity Recognition
    Liu, Tianshan
    Lam, Kin-Man
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 4303 - 4308