FSTT: Flow-Guided Spatial Temporal Transformer for Deep Video Inpainting

被引:5
|
作者
Liu, Ruixin [1 ]
Zhu, Yuesheng [1 ]
机构
[1] Peking Univ, Shenzhen Grad Sch, Commun & Informat Secur Lab, Shenzhen 518055, Peoples R China
基金
中国国家自然科学基金;
关键词
deep video inpainting; video editing; spatial temporal transformer; optical flow; object removal;
D O I
10.3390/electronics12214452
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Video inpainting aims to complete the missing regions with content that is consistent both spatially and temporally. How to effectively utilize the spatio-temporal information in videos is critical for video inpainting. Recent advances in video inpainting methods combine both optical flow and transformers to capture spatio-temporal information. However, these methods fail to fully explore the potential of optical flow within the transformer. Furthermore, the designed transformer block cannot effectively integrate spatio-temporal information across frames. To address the above problems, we propose a novel video inpainting model, named Flow-Guided Spatial Temporal Transformer (FSTT), which effectively establishes correspondences between missing regions and valid regions in both spatial and temporal dimensions under the guidance of completed optical flow. Specifically, a Flow-Guided Fusion Feed-Forward module is developed to enhance features with the assistance of optical flow, mitigating the inaccuracies caused by hole pixels when performing MHSA. Additionally, a decomposed spatio-temporal MHSA module is proposed to effectively capture spatio-temporal dependencies in videos. To improve the efficiency of the model, a Global-Local Temporal MHSA module is further designed based on the window partition strategy. Extensive quantitative and qualitative experiments on the DAVIS and YouTube-VOS datasets demonstrate the superiority of our proposed method.
引用
收藏
页数:20
相关论文
共 50 条
  • [41] DEEP VIDEO INPAINTING GUIDED BY AUDIO-VISUAL SELF-SUPERVISION
    Kim, Kyuyeon
    Jung, Junsik
    Kim, Woo Jae
    Yoon, Sung-Eui
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 1970 - 1974
  • [42] FLOW-GUIDED DEFORMABLE ATTENTION NETWORK FOR FAST ONLINE VIDEO SUPER-RESOLUTION
    Yang, Xi
    Zhang, Xindong
    Zhang, Lei
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 390 - 394
  • [43] BSRT: Improving Burst Super-Resolution with Swin Transformer and Flow-Guided Deformable Alignment
    Luo, Ziwei
    Li, Youwei
    Cheng, Shen
    Yu, Lei
    Wu, Qi
    Wen, Zhihong
    Fan, Haoqiang
    Sun, Jian
    Liu, Shuaicheng
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 997 - 1007
  • [44] High resolution video inpainting based on spatial structure and temporal edge information
    Bo, Dezhi
    Ma, Ran
    Wang, Keke
    Li, Kai
    An, Ping
    OPTOELECTRONIC IMAGING AND MULTIMEDIA TECHNOLOGY VII, 2020, 11550
  • [45] Spatial-Temporal Transformer for Video Snapshot Compressive Imaging
    Wang, Lishun
    Cao, Miao
    Zhong, Yong
    Yuan, Xin
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (07) : 9072 - 9089
  • [46] ShiftFormer: Spatial-Temporal Shift Operation in Video Transformer
    Yang, Beiying
    Zhu, Guibo
    Ge, Guojing
    Luo, Jinzhao
    Wang, Jinqiao
    2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 1895 - 1900
  • [47] Attention-guided video super-resolution with recurrent multi-scale spatial–temporal transformer
    Wei Sun
    Xianguang Kong
    Yanning Zhang
    Complex & Intelligent Systems, 2023, 9 : 3989 - 4002
  • [48] Aggregating multi-scale flow-enhanced information in transformer for video inpainting
    Li, Guanxiao
    Zhang, Ke
    Su, Yu
    Wang, Jingyu
    MULTIMEDIA SYSTEMS, 2025, 31 (01)
  • [49] Local flow propagation and global multi-scale dilated Transformer for video inpainting
    Zuo, Yuting
    Chen, Jing
    Wang, Kaixing
    Lin, Qi
    Zeng, Huanqiang
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2025, 107
  • [50] Optical Flow-Guided Deep Convolutional Neural Networks for UAV Detection in Infrared Videos
    Yang, Xin
    Wang, Yi-zheng
    Wang, Gang
    2022 IEEE 7TH INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION ENGINEERING, ICITE, 2022, : 457 - 461