Panoramic Vision Transformer for Saliency Detection in 360° Videos

被引:11
|
作者
Yun, Heeseung [1 ]
Lee, Sehun [1 ]
Kim, Gunhee [1 ]
机构
[1] Seoul Natl Univ, Seoul, South Korea
来源
关键词
360 degrees videos; Saliency detection; Vision transformer;
D O I
10.1007/978-3-031-19833-5_25
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
360 degrees video saliency detection is one of the challenging benchmarks for 360 degrees video understanding since non-negligible distortion and discontinuity occur in the projection of any format of 360 degrees videos, and capture-worthy viewpoint in the omnidirectional sphere is ambiguous by nature. We present a new framework named Panoramic Vision Transformer (PAVER). We design the encoder using Vision Transformer with deformable convolution, which enables us not only to plug pretrained models from normal videos into our architecture without additional modules or finetuning but also to perform geometric approximation only once, unlike previous deep CNN-based approaches. Thanks to its powerful encoder, PAVER can learn the saliency from three simple relative relations among local patch features, outperforming state-of-the-art models for the Wild360 benchmark by large margins without supervision or auxiliary information like class activation. We demonstrate the utility of our saliency prediction model with the omnidirectional video quality assessment task in VQA-ODV, where we consistently improve performance without any form of supervision, including head movement.
引用
收藏
页码:422 / 439
页数:18
相关论文
共 50 条
  • [1] Saliency Detection in 360° Videos
    Zhang, Ziheng
    Xu, Yanyu
    Yu, Jingyi
    Gao, Shenghua
    COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 : 504 - 520
  • [2] Saliency Detection of Panoramic Images Based on Robust Vision Transformer and Multiple Attention
    Chen, Xiaolei
    Zhang, Pengcheng
    Lu, Yubing
    Cao, Baoning
    JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY, 2023, 45 (06) : 2246 - 2255
  • [3] Saliency Prediction Network for 360° Videos
    Zhang, Youqiang
    Dai, Feng
    Ma, Yike
    Li, Hongliang
    Zhao, Qiang
    Zhang, Yongdong
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2020, 14 (01) : 27 - 37
  • [4] Saliency Computation for Virtual Cinematography in 360° Videos
    Du, Ruofei
    Varshney, Amitabh
    IEEE COMPUTER GRAPHICS AND APPLICATIONS, 2021, 41 (04) : 99 - 106
  • [5] A Saliency Dataset for 360-Degree Videos
    Anh Nguyen
    Yan, Zhisheng
    PROCEEDINGS OF THE 10TH ACM MULTIMEDIA SYSTEMS CONFERENCE (ACM MMSYS'19), 2019, : 279 - 284
  • [6] Vision Transformer-Based Tailing Detection in Videos
    Lee, Jaewoo
    Lee, Sungjun
    Cho, Wonki
    Siddiqui, Zahid Ali
    Park, Unsang
    APPLIED SCIENCES-BASEL, 2021, 11 (24):
  • [7] MR360: Mixed Reality Rendering for 360° Panoramic Videos
    Rhee, Taehyun
    Petikam, Lohit
    Allen, Benjamin
    Chalmers, Andrew
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2017, 23 (04) : 1302 - 1311
  • [8] Interactive Panoramic Ray Tracing for Mixed 360° RGBD Videos
    Wu, Jian
    Wang, Lili
    Ke, Wei
    2023 IEEE CONFERENCE ON VIRTUAL REALITY AND 3D USER INTERFACES ABSTRACTS AND WORKSHOPS, VRW, 2023, : 777 - 778
  • [9] Viewing Bias Matters in 360° Videos Visual Saliency Prediction
    Chen, Peng-Wen
    Yang, Tsung-Shan
    Huang, Gi-Luen
    Huang, Chia-Wen
    Chao, Yu-Chieh
    Lu, Chien-Hung
    Wu, Pei-Yuan
    IEEE ACCESS, 2023, 11 : 46084 - 46094
  • [10] Saliency Detection on Videos with Scene Change
    Li, Junling
    Meng, Fang
    Mao, Jingbo
    2014 INTERNATIONAL CONFERENCE ON AUDIO, LANGUAGE AND IMAGE PROCESSING (ICALIP), VOLS 1-2, 2014, : 506 - 510