FVIFormer: Flow-Guided Global-Local Aggregation Transformer Network for Video Inpainting

被引:4
|
作者
Yan, Weiqing [1 ]
Sun, Yiqiu [1 ]
Yue, Guanghui [2 ]
Zhou, Wei [3 ]
Liu, Hantao [3 ]
机构
[1] Yantai Univ, Sch Comp & Control Engn, Yantai 261400, Peoples R China
[2] Shenzhen Univ, Med Sch, Sch Biomed Engn, Shenzhen 518060, Peoples R China
[3] Cardiff Univ, Sch Comp Sci & Informat, Cardiff CF24 4AG, Wales
基金
中国国家自然科学基金;
关键词
Machine learning--deep learning; OBJECT REMOVAL; IMAGE;
D O I
10.1109/JETCAS.2024.3392972
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Video inpainting has been extensively used in recent years. Established works usually utilise the similarity between the missing region and its surrounding features to inpaint in the visually damaged content in a multi-stage manner. However, due to the complexity of the video content, it may result in the destruction of structural information of objects within the video. In addition to this, the presence of moving objects in the damaged regions of the video can further increase the difficulty of this work. To address these issues, we propose a flow-guided global-Local aggregation Transformer network for video inpainting. First, we use a pre-trained optical flow complementation network to repair the defective optical flow of video frames. Then, we propose a content inpainting module, which use the complete optical flow as a guide, and propagate the global content across the video frames using efficient temporal and spacial Transformer to inpaint in the corrupted regions of the video. Finally, we propose a structural rectification module to enhance the coherence of content around the missing regions via combining the extracted local and global features. In addition, considering the efficiency of the overall framework, we also optimized the self-attention mechanism to improve the speed of training and testing via depth-wise separable encoding. We validate the effectiveness of our method on the YouTube-VOS and DAVIS video datasets. Extensive experiment results demonstrate the effectiveness of our approach in edge-complementing video content that has undergone stabilisation algorithms.
引用
收藏
页码:235 / 244
页数:10
相关论文
共 50 条
  • [41] FGC-VC: FLOW-GUIDED CONTEXT VIDEO COMPRESSION
    Wang, Yiming
    Huang, Qian
    Tang, Bin
    Sun, Huashan
    Guo, Xiaotong
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 3175 - 3179
  • [42] ConvTransNet: A CNN-Transformer Network for Change Detection With Multiscale Global-Local Representations
    Li, Weiming
    Xue, Lihui
    Wang, Xueqian
    Li, Gang
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [43] WTVI: A Wavelet-Based Transformer Network for Video Inpainting
    Zhang, Ke
    Li, Guanxiao
    Su, Yu
    Wang, Jingyu
    IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 616 - 620
  • [44] G2LP-Net: Global to Local Progressive Video Inpainting Network
    Ji, Zhong
    Hou, Jiacheng
    Su, Yimu
    Pang, Yanwei
    Li, Xuelong
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (03) : 1082 - 1092
  • [45] Spatio-Temporal Inference Transformer Network for Video Inpainting
    Tudavekar, Gajanan
    Saraf, Santosh S.
    Patil, Sanjay R.
    INTERNATIONAL JOURNAL OF IMAGE AND GRAPHICS, 2023, 23 (01)
  • [46] Local and global mixture network for image inpainting
    Woo, Seunggyun
    Ko, Keunsoo
    Kim, Chang-Su
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2024, 104
  • [47] GLCSA-Net: global-local constraints-based spectral adaptive network for hyperspectral image inpainting
    Chen, Hu
    Li, Jia
    Zhang, Junjie
    Fu, Yu
    Yan, Chenggang
    Zeng, Dan
    VISUAL COMPUTER, 2024, 40 (05): : 3331 - 3346
  • [48] Deep global-local transformer network combined with extended morphological profiles for hyperspectral image classification
    Tan, Xiong
    Gao, Kuiliang
    Liu, Bing
    Fu, Yumeng
    Kang, Lei
    JOURNAL OF APPLIED REMOTE SENSING, 2021, 15 (03)
  • [49] GLU-Net: Global-Local Universal Network for Dense Flow and Correspondences
    Truong, Prune
    Danelljan, Martin
    Timofte, Radu
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 6257 - 6267
  • [50] Hierarchical Global-Local Temporal Modeling for Video Captioning
    Hu, Yaosi
    Chen, Zhenzhong
    Zha, Zheng-Jun
    Wu, Feng
    PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 774 - 783