SwinVI:3D Swin Transformer Model with U-net for Video Inpainting

被引:0
|
作者
Zhang, Wei [1 ]
Cao, Yang [1 ]
Zhai, Junhai [1 ]
机构
[1] Hebei Univ, Coll Math & Informat Sci, Hebei Key Lab Machine Learning & Computat Intelli, Baoding, Peoples R China
关键词
Transformer; Video inpainting; Spatio-temporal;
D O I
10.1109/IJCNN54540.2023.10192024
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The goal of video inpainting is to fill in the local missingness of a given video as realistic as possible, it remains a challenging task, even with powerful deep learning methods. In recent years, Transformer has been introduced to video inpainting, and remarkable improvement has been achieved. However, it still suffers from the problems of generating blurry texture and requiring high computational cost. To address the two problems, we propose a new 3D Swin Transformer model (SwinVI) with U-net to improve the quality of video inpainting efficiently. We modify the vanilla Swin Transformer by extending the standard self-attention mechanism to a 3D self-attention mechanism, which enables the modified model to process spatio-temporal information simultaneously. SwinVI consists of U-net implemented by 3D Patch Merge and CNN-equipped upsampling module, which provides an end-to-end learning framework. This structural design empowers SwinVI to fully focus on background textures and moving objects to learn robust and more representative token vectors. Accordingly, to significantly improve the quality of video inpainting efficiently. We experimentally compare SwinVI with multiple methods on two challenging benchmarks. Experimental results demonstrate that the proposed SwinVI outperforms the state-of-the-art methods in RMSE, SSIM, and PSNR.
引用
收藏
页数:8
相关论文
共 50 条
  • [21] Video Summarization Through Reinforcement Learning With a 3D Spatio-Temporal U-Net
    Liu, Tianrui
    Meng, Qingjie
    Huang, Jun-Jie
    Vlontzos, Athanasios
    Rueckert, Daniel
    Kainz, Bernhard
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 1573 - 1586
  • [22] CTUNet: automatic pancreas segmentation using a channel-wise transformer and 3D U-Net
    Chen, Lifang
    Wan, Li
    VISUAL COMPUTER, 2023, 39 (11): : 5229 - 5243
  • [23] CTUNet: automatic pancreas segmentation using a channel-wise transformer and 3D U-Net
    Lifang Chen
    Li Wan
    The Visual Computer, 2023, 39 : 5229 - 5243
  • [24] SSTU: Swin-Spectral Transformer U-Net for hyperspectral whole slide image reconstruction
    Wang, Yukun
    Gu, Yanfeng
    Nanding, Abiyasi
    COMPUTERIZED MEDICAL IMAGING AND GRAPHICS, 2024, 114
  • [25] 3D U-Net for Skull Stripping in Brain MRI
    Hwang, Hyunho
    Rehman, Hafiz Zia Ur
    Lee, Sungon
    APPLIED SCIENCES-BASEL, 2019, 9 (03):
  • [26] Medical Image Segmentation Based on 3D U-net
    Chen, Silu
    Hu, Guanghao
    Sun, Jun
    2020 19TH INTERNATIONAL SYMPOSIUM ON DISTRIBUTED COMPUTING AND APPLICATIONS FOR BUSINESS ENGINEERING AND SCIENCE (DCABES 2020), 2020, : 130 - 133
  • [27] A 3D attention U-Net network and its application in geological model parameterization
    Li X.
    Li X.
    Yan L.
    Zhou T.
    Li S.
    Wang J.
    Li X.
    Shiyou Kantan Yu Kaifa/Petroleum Exploration and Development, 2023, 50 (01): : 167 - 173
  • [28] A 3D attention U-Net network and its application in geological model parameterization
    LI Xiaobo
    LI Xin
    YAN Lin
    ZHOU Tenghua
    LI Shunming
    WANG Jiqiang
    LI Xinhao
    PetroleumExplorationandDevelopment, 2023, 50 (01) : 183 - 190
  • [29] U-Net vs Transformer: Is U-Net Outdated in Medical Image Registration?
    Jia, Xi
    Bartlett, Joseph
    Zhang, Tianyang
    Lu, Wenqi
    Qiu, Zhaowen
    Duan, Jinming
    MACHINE LEARNING IN MEDICAL IMAGING, MLMI 2022, 2022, 13583 : 151 - 160
  • [30] A 3D attention U-Net network and its application in geological model parameterization
    Li, Xiaobo
    Li, Xin
    Yan, Lin
    Zhou, Tenghua
    Li, Shunming
    Wang, Jiqiang
    Li, Xinhao
    PETROLEUM EXPLORATION AND DEVELOPMENT, 2023, 50 (01) : 183 - 190