FVIFormer: Flow-Guided Global-Local Aggregation Transformer Network for Video Inpainting

被引:4
|
作者
Yan, Weiqing [1 ]
Sun, Yiqiu [1 ]
Yue, Guanghui [2 ]
Zhou, Wei [3 ]
Liu, Hantao [3 ]
机构
[1] Yantai Univ, Sch Comp & Control Engn, Yantai 261400, Peoples R China
[2] Shenzhen Univ, Med Sch, Sch Biomed Engn, Shenzhen 518060, Peoples R China
[3] Cardiff Univ, Sch Comp Sci & Informat, Cardiff CF24 4AG, Wales
基金
中国国家自然科学基金;
关键词
Machine learning--deep learning; OBJECT REMOVAL; IMAGE;
D O I
10.1109/JETCAS.2024.3392972
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Video inpainting has been extensively used in recent years. Established works usually utilise the similarity between the missing region and its surrounding features to inpaint in the visually damaged content in a multi-stage manner. However, due to the complexity of the video content, it may result in the destruction of structural information of objects within the video. In addition to this, the presence of moving objects in the damaged regions of the video can further increase the difficulty of this work. To address these issues, we propose a flow-guided global-Local aggregation Transformer network for video inpainting. First, we use a pre-trained optical flow complementation network to repair the defective optical flow of video frames. Then, we propose a content inpainting module, which use the complete optical flow as a guide, and propagate the global content across the video frames using efficient temporal and spacial Transformer to inpaint in the corrupted regions of the video. Finally, we propose a structural rectification module to enhance the coherence of content around the missing regions via combining the extracted local and global features. In addition, considering the efficiency of the overall framework, we also optimized the self-attention mechanism to improve the speed of training and testing via depth-wise separable encoding. We validate the effectiveness of our method on the YouTube-VOS and DAVIS video datasets. Extensive experiment results demonstrate the effectiveness of our approach in edge-complementing video content that has undergone stabilisation algorithms.
引用
收藏
页码:235 / 244
页数:10
相关论文
共 50 条
  • [31] An Adaptive Post-Processing Network With the Global-Local Aggregation for Semantic Segmentation
    Zhu, Guilin
    Wang, Runmin
    Liu, Yingying
    Zhu, Zhenlin
    Gao, Changxin
    Liu, Li
    Sang, Nong
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (02) : 1159 - 1173
  • [32] Structure Guided Global and Local Attention Transformer for Image Inpainting of Obscured Ships in Maritime Surveillance
    Baek, Woonyoung
    Kang, Sanggil
    Yang, Young-Hoon
    IEEE ACCESS, 2024, 12 : 101999 - 102015
  • [33] Video Captioning Using Global-Local Representation
    Yan, Liqi
    Ma, Siqi
    Wang, Qifan
    Chen, Yingjie
    Zhang, Xiangyu
    Savakis, Andreas
    Liu, Dongfang
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (10) : 6642 - 6656
  • [34] GLOBAL-LOCAL DETAIL GUIDED TRANSFORMER FOR SEA ICE RECOGNITION IN OPTICAL REMOTE SENSING IMAGES
    Huang, Zhanchao
    Hong, Wenjun
    Su, Hua
    IGARSS 2024-2024 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, IGARSS 2024, 2024, : 1768 - 1772
  • [35] Temporal context video compression with flow-guided feature prediction
    Wang, Yiming
    Huang, Qian
    Tang, Bin
    Sun, Huashan
    Guo, Xiaotong
    Miao, Zhuang
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 247
  • [36] Global-Local 3-D Convolutional Transformer Network for Hyperspectral Image Classification
    Qi, Wenchao
    Huang, Changping
    Wang, Yibo
    Zhang, Xia
    Sun, Weiwei
    Zhang, Lifu
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [37] Contrastive Learning of Global-Local Video Representations
    Ma, Shuang
    Zeng, Zhaoyang
    McDuff, Daniel
    Song, Yale
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [38] Flow-guided feature enhancement network for video-based person re-identification
    Gong, Weichao
    Yan, Bo
    Lin, Chuming
    NEUROCOMPUTING, 2020, 383 : 295 - 302
  • [39] Flow-Guided Diffusion Autoencoder for Unsupervised Video Anomaly Detection
    Zhu, Aoni
    Wang, Wenjun
    Yan, Cheng
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT VI, 2024, 14430 : 183 - 194
  • [40] Global-Local Multigranularity Transformer for Hyperspectral Image Classification
    Meng, Zhe
    Yan, Qian
    Zhao, Feng
    Chen, Gaige
    Hua, Wenqiang
    Liang, Miaomiao
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2025, 18 : 112 - 131