Panoramic Vision Transformer for Saliency Detection in 360° Videos

被引:11
|
作者
Yun, Heeseung [1 ]
Lee, Sehun [1 ]
Kim, Gunhee [1 ]
机构
[1] Seoul Natl Univ, Seoul, South Korea
来源
关键词
360 degrees videos; Saliency detection; Vision transformer;
D O I
10.1007/978-3-031-19833-5_25
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
360 degrees video saliency detection is one of the challenging benchmarks for 360 degrees video understanding since non-negligible distortion and discontinuity occur in the projection of any format of 360 degrees videos, and capture-worthy viewpoint in the omnidirectional sphere is ambiguous by nature. We present a new framework named Panoramic Vision Transformer (PAVER). We design the encoder using Vision Transformer with deformable convolution, which enables us not only to plug pretrained models from normal videos into our architecture without additional modules or finetuning but also to perform geometric approximation only once, unlike previous deep CNN-based approaches. Thanks to its powerful encoder, PAVER can learn the saliency from three simple relative relations among local patch features, outperforming state-of-the-art models for the Wild360 benchmark by large margins without supervision or auxiliary information like class activation. We demonstrate the utility of our saliency prediction model with the omnidirectional video quality assessment task in VQA-ODV, where we consistently improve performance without any form of supervision, including head movement.
引用
收藏
页码:422 / 439
页数:18
相关论文
共 50 条
  • [31] SST-Sal: A spherical spatio-temporal approach for saliency prediction in 360∘ videos
    Berdun, Edurne Bernal
    Serrano, Daniel Martin
    Perez, Diego Gutierrez
    Corcoy, Belen Masia
    Computers and Graphics (Pergamon), 2022, 106 : 200 - 209
  • [32] SST-Sal: A spherical spatio-temporal approach for saliency prediction in 360° videos
    Berdun, Edurne Bernal
    Serrano, Daniel Martin
    Perez, Diego Gutierrez
    Corcoy, Belen Masia
    COMPUTERS & GRAPHICS-UK, 2022, 106 : 200 - 209
  • [33] SGDViT: Saliency-Guided Dynamic Vision Transformer for UAV Tracking
    Yao, Liangliang
    Fu, Changhong
    Li, Sihang
    Zheng, Guangze
    Ye, Junjie
    2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA, 2023, : 3353 - 3359
  • [34] Transformer-Based Fire Detection in Videos
    Mardani, Konstantina
    Vretos, Nicholas
    Daras, Petros
    SENSORS, 2023, 23 (06)
  • [35] MULTI-SCALE TRANSFORMER NETWORK FOR SALIENCY PREDICTION ON 360-DEGREE IMAGES
    Lin, Xu
    Qing, Chunmei
    Tan, Junpeng
    Xu, Xiangmin
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 1700 - 1704
  • [36] Graph learning model for saliency detection in thermal pedestrian videos
    Stojanovic, Vladimir
    Deng, Jian
    Milic, Dunja
    INTERNATIONAL JOURNAL OF SOLIDS AND STRUCTURES, 2023, 270
  • [37] Transformer-based fall detection in videos
    Nunez-Marcos, Adrian
    Arganda-Carreras, Ignacio
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 132
  • [38] RELATIONAL ENTROPY-BASED SALIENCY DETECTION IN IMAGES AND VIDEOS
    Duncan, Kester
    Sarkar, Sudeep
    2012 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP 2012), 2012, : 1093 - 1096
  • [39] A Spatio Temporal Texture Saliency Approach for Object Detection in Videos
    Sasithradevi, A.
    Roomi, S. Mohamed Mansoor
    Sanofer, I.
    COMPUTER VISION, GRAPHICS, AND IMAGE PROCESSING, ICVGIP 2016, 2017, 10481 : 63 - 74
  • [40] Prediction of remaining surgery duration in laparoscopic videos based on visual saliency and the transformer network
    Loukas, Constantinos
    Seimenis, Ioannis
    Prevezanou, Konstantina
    Schizas, Dimitrios
    INTERNATIONAL JOURNAL OF MEDICAL ROBOTICS AND COMPUTER ASSISTED SURGERY, 2024, 20 (02):