Panoramic Vision Transformer for Saliency Detection in 360° Videos

被引:11
|
作者
Yun, Heeseung [1 ]
Lee, Sehun [1 ]
Kim, Gunhee [1 ]
机构
[1] Seoul Natl Univ, Seoul, South Korea
来源
关键词
360 degrees videos; Saliency detection; Vision transformer;
D O I
10.1007/978-3-031-19833-5_25
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
360 degrees video saliency detection is one of the challenging benchmarks for 360 degrees video understanding since non-negligible distortion and discontinuity occur in the projection of any format of 360 degrees videos, and capture-worthy viewpoint in the omnidirectional sphere is ambiguous by nature. We present a new framework named Panoramic Vision Transformer (PAVER). We design the encoder using Vision Transformer with deformable convolution, which enables us not only to plug pretrained models from normal videos into our architecture without additional modules or finetuning but also to perform geometric approximation only once, unlike previous deep CNN-based approaches. Thanks to its powerful encoder, PAVER can learn the saliency from three simple relative relations among local patch features, outperforming state-of-the-art models for the Wild360 benchmark by large margins without supervision or auxiliary information like class activation. We demonstrate the utility of our saliency prediction model with the omnidirectional video quality assessment task in VQA-ODV, where we consistently improve performance without any form of supervision, including head movement.
引用
收藏
页码:422 / 439
页数:18
相关论文
共 50 条
  • [41] Graph learning model for saliency detection in thermal pedestrian videos
    Zheng, Yu
    Zhou, Fugen
    Li, Lu
    Sun, Changming
    Bai, Xiangzhi
    INFRARED PHYSICS & TECHNOLOGY, 2023, 131
  • [42] Saliency Detection in Face Videos: A Data-Driven Approach
    Xu, Mai
    Ren, Yun
    Wang, Zulin
    Liu, Jingxian
    Tao, Xiaoming
    IEEE TRANSACTIONS ON MULTIMEDIA, 2018, 20 (06) : 1335 - 1349
  • [43] CNN-based temporal detection of motion saliency in videos
    Maczyta, Leo
    Bouthemy, Patrick
    Le Meur, Olivier
    PATTERN RECOGNITION LETTERS, 2019, 128 : 298 - 305
  • [44] Saliency Prediction on Mobile Videos: A Fixation Mapping-Based Dataset and A Transformer Approach
    Wen, Shijie
    Yang, Li
    Xu, Mai
    Qiao, Minglang
    Xu, Tao
    Bai, Lin
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (07) : 5935 - 5950
  • [45] 360Spred: Saliency Prediction for 360-Degree Videos Based on 3D Separable Graph Convolutional Networks
    Yang, Qin
    Gao, Wenxuan
    Li, Chenglin
    Wang, Hao
    Dai, Wenrui
    Zou, Junni
    Xiong, Hongkai
    Frossard, Pascal
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (10) : 9979 - 9996
  • [46] 360SCAN: High-speed Rotating Line Sensor for Real-time 360° Panoramic Vision
    Belbachir, A. N.
    Mayerhofer, M.
    Matolin, D.
    Colineau, J.
    2012 SIXTH INTERNATIONAL CONFERENCE ON DISTRIBUTED SMART CAMERAS (ICDSC), 2012,
  • [47] TransVisDrone: Spatio-Temporal Transformer for Vision-based Drone-to-Drone Detection in Aerial Videos
    Sangam, Tushar
    Dave, Ishan Rajendrakumar
    Sultani, Waqas
    Shah, Mubarak
    2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA, 2023, : 6006 - 6013
  • [48] A Panoramic Stereo-vision Detection and Recognition System
    Chiu, Chung-Cheng
    Chiu, Sheng-Yi
    2016 INTERNATIONAL CONFERENCE ON SYSTEM SCIENCE AND ENGINEERING (ICSSE), 2016,
  • [49] Detection of panoramic vision pedestrian based on deep learning
    Wang, Wenhao
    IMAGE AND VISION COMPUTING, 2020, 103
  • [50] PHD: A Deep Learning Based Human Detection Framework for Panoramic Videos
    Tang, Jinting
    Chen, Zhenhui
    Huo, Yongkai
    Zhang, Peichang
    2019 11TH INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS AND SIGNAL PROCESSING (WCSP), 2019,