STD-Net: Spatio-Temporal Decomposition Network for Video Demoiring With Sparse Transformers

被引:0
|
作者
Niu, Yuzhen [1 ,2 ]
Xu, Rui [1 ,2 ]
Lin, Zhihua [3 ]
Liu, Wenxi [1 ,2 ]
机构
[1] Fuzhou Univ, Coll Comp & Data Sci, Fujian Key Lab Network Comp & Intelligent Informa, Fuzhou 350108, Peoples R China
[2] Minist Educ, Engn Res Ctr Bigdata Intelligence, Fuzhou 350108, Peoples R China
[3] Res Inst Alipay Informat Technol Co Ltd, Hangzhou 310000, Peoples R China
基金
中国国家自然科学基金;
关键词
Image restoration; video demoireing; video restoration; spatio-temporal network; sparse transformer; QUALITY ASSESSMENT; IMAGE;
D O I
10.1109/TCSVT.2024.3386604
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The problem of video demoireing is a new challenge in video restoration. Unlike image demoireing, which involves removing static and uniform patterns, video demoireing requires tackling dynamic and varied moire patterns while maintaining video details, colors, and temporal consistency. It is particularly challenging to model moire patterns for videos with camera or object motions, where separating moire from the original video content across frames is extremely difficult. Nonetheless, we observe that the spatial distribution of moire patterns is often sparse on each frame, and their long-range temporal correlation is not significant. To fully leverage this phenomenon, a sparsity-constrained spatial self-attention scheme is proposed to concentrate on removing sparse moire efficiently for each frame without being distracted by dynamic video content. The frame-wise spatial features are then correlated and aggregated via the local temporal cross-frame-attention module to produce temporal-consistent high-quality moire-free videos. The above decoupled spatial and temporal transformers constitute the Spatio-Temporal Decomposition Network, dubbed STD-Net. For evaluation, we present a large-scale video demoireing benchmark featuring various real-life scenes, camera motions, and object motions. We demonstrate that our proposed model can effectively and efficiently achieve superior performance on video demoireing and single image demoireing tasks. The proposed dataset is released at https://github.com/FZU-N/LVDM.
引用
收藏
页码:8562 / 8575
页数:14
相关论文
共 50 条
  • [1] Video saliency detection by spatio-temporal sampling and sparse matrix decomposition
    Pan, Yunfeng
    Jiang, Qiuping
    Li, Zhutuan
    Shao, Feng
    WSEAS Transactions on Computers, 2014, 13 : 520 - 527
  • [2] TubeDETR: Spatio-Temporal Video Grounding with Transformers
    Yang, Antoine
    Miech, Antoine
    Sivic, Josef
    Laptev, Ivan
    Schmid, Cordelia
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 16421 - 16432
  • [3] DSTA-Net: Deformable Spatio-Temporal Attention Network for Video Inpainting
    Liu, Tongxing
    Qiu, Guoxin
    Xuan, Hanyu
    IEEE SIGNAL PROCESSING LETTERS, 2025, 32 : 771 - 775
  • [4] Spatio-Temporal Transformer Network for Video Restoration
    Kim, Tae Hyun
    Sajjadi, Mehdi S. M.
    Hirsch, Michael
    Schoelkopf, Bernhard
    COMPUTER VISION - ECCV 2018, PT III, 2018, 11207 : 111 - 127
  • [5] Spatio-temporal decomposition of sport events for video indexing
    Barceló, L
    Orriols, X
    Binefa, X
    IMAGE AND VIDEO RETRIEVAL, PROCEEDINGS, 2003, 2728 : 435 - 445
  • [6] Dast-Net: Depth-Aware Spatio-Temporal Network for Video Deblurring
    Zhu, Qi
    Xiao, Zeyu
    Huang, Jie
    Zhao, Feng
    Proceedings - IEEE International Conference on Multimedia and Expo, 2022, 2022-July
  • [7] Human-Centric Spatio-Temporal Video Grounding With Visual Transformers
    Tang, Zongheng
    Liao, Yue
    Liu, Si
    Li, Guanbin
    Jin, Xiaojie
    Jiang, Hongxu
    Yu, Qian
    Xu, Dong
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (12) : 8238 - 8249
  • [8] Neural Video Compression with Spatio-Temporal Cross-Covariance Transformers
    Chen, Zhenghao
    Relic, Lucas
    Azevedo, Roberto
    Zhang, Yang
    Gross, Markus
    Xu, Dong
    Zhou, Luping
    Schroers, Christopher
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 8543 - 8551
  • [9] End-to-End Spatio-Temporal Action Localisation with Video Transformers
    Gritsenko, Alexey A.
    Xiong, Xuehan
    Djolonga, Josip
    Dehghani, Mostafa
    Sun, Chen
    Lucic, Mario
    Schmid, Cordelia
    Arnab, Anurag
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 18373 - 18383
  • [10] Efficient spatio-temporal decomposition for perceptual processing of video sequences
    Lindh, P
    Lambrecht, CJVB
    INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, PROCEEDINGS - VOL III, 1996, : 331 - 334