SwinVI:3D Swin Transformer Model with U-net for Video Inpainting

被引:0
|
作者
Zhang, Wei [1 ]
Cao, Yang [1 ]
Zhai, Junhai [1 ]
机构
[1] Hebei Univ, Coll Math & Informat Sci, Hebei Key Lab Machine Learning & Computat Intelli, Baoding, Peoples R China
关键词
Transformer; Video inpainting; Spatio-temporal;
D O I
10.1109/IJCNN54540.2023.10192024
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The goal of video inpainting is to fill in the local missingness of a given video as realistic as possible, it remains a challenging task, even with powerful deep learning methods. In recent years, Transformer has been introduced to video inpainting, and remarkable improvement has been achieved. However, it still suffers from the problems of generating blurry texture and requiring high computational cost. To address the two problems, we propose a new 3D Swin Transformer model (SwinVI) with U-net to improve the quality of video inpainting efficiently. We modify the vanilla Swin Transformer by extending the standard self-attention mechanism to a 3D self-attention mechanism, which enables the modified model to process spatio-temporal information simultaneously. SwinVI consists of U-net implemented by 3D Patch Merge and CNN-equipped upsampling module, which provides an end-to-end learning framework. This structural design empowers SwinVI to fully focus on background textures and moving objects to learn robust and more representative token vectors. Accordingly, to significantly improve the quality of video inpainting efficiently. We experimentally compare SwinVI with multiple methods on two challenging benchmarks. Experimental results demonstrate that the proposed SwinVI outperforms the state-of-the-art methods in RMSE, SSIM, and PSNR.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] PARSE CHALLENGE 2022: PULMONARY ARTERIES SEGMENTATION USING SWIN U-NET TRANSFORMER(SWIN UNETR) AND U-NET
    Padhy, Rohan
    Maurya, Akansh
    Patil, Kunal Dasharath
    Ramakrishna, Kalluri
    Krishnamurthi, Ganapathy
    2023 IEEE 20TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING, ISBI, 2023,
  • [2] Swin-CasUNet: Cascaded U-Net with Swin Transformer for Masked Face Restoration
    Zeng, Chengbin
    Liu, Yi
    Song, Chunli
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 386 - 392
  • [3] STU3Net: An Improved U-Net With Swin Transformer Fusion for Thyroid Nodule Segmentation
    Deng, Xiangyu
    Dang, Zhiyan
    Pan, Lihao
    INTERNATIONAL JOURNAL OF IMAGING SYSTEMS AND TECHNOLOGY, 2024, 34 (05)
  • [4] Image denoising based on Swin Transformer Residual Conv U-Net
    Gan, Yong
    Zhou, Shaohui
    Chen, Haonan
    Wang, Yuefeng
    27TH IEEE/ACIS INTERNATIONAL SUMMER CONFERENCE ON SOFTWARE ENGINEERING ARTIFICIAL INTELLIGENCE NETWORKING AND PARALLEL/DISTRIBUTED COMPUTING, SNPD 2024-SUMMER, 2024, : 166 - 170
  • [5] A 3D U-Net Based on a Vision Transformer for Radar Semantic Segmentation
    Zhang, Tongrui
    Fan, Yunsheng
    SENSORS, 2023, 23 (24)
  • [6] On Improving 3D U-net Architecture
    Janovsky, Roman
    Sedlacek, David
    Zara, Jiri
    ICSOFT: PROCEEDINGS OF THE 14TH INTERNATIONAL CONFERENCE ON SOFTWARE TECHNOLOGIES, 2019, : 649 - 656
  • [7] An attention based residual U-Net with swin transformer for brain MRI segmentation
    Angona, Tazkia Mim
    Mondal, M. Rubaiyat Hossain
    ARRAY, 2025, 25
  • [8] Swin Deformable Attention U-Net Transformer (SDAUT) for Explainable Fast MRI
    Huang, Jiahao
    Xing, Xiaodan
    Gao, Zhifan
    Yang, Guang
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2022, PT VI, 2022, 13436 : 538 - 548
  • [9] Adaptive enhanced swin transformer with U-net for remote sensing image segmentation*
    Gu, Xingjian
    Li, Sizhe
    Ren, Shougang
    Zheng, Hengbiao
    Fan, Chengcheng
    Xu, Huanliang
    COMPUTERS & ELECTRICAL ENGINEERING, 2022, 102
  • [10] Efficient combined algorithm of Transformer and U-Net for 3D medical image segmentation
    Zhang, Mingyan
    Wang, Aixia
    Yang, Gang
    Li, Jingjiao
    2023 35TH CHINESE CONTROL AND DECISION CONFERENCE, CCDC, 2023, : 4377 - 4382