Saliency Detection of Panoramic Images Based on Robust Vision Transformer and Multiple Attention

被引:0
|
作者
Chen, Xiaolei [1 ]
Zhang, Pengcheng
Lu, Yubing
Cao, Baoning
机构
[1] Lanzhou Univ Technol, Sch Elect Engn & Informat Engn, Lanzhou 730050, Peoples R China
基金
中国国家自然科学基金;
关键词
Panoramic image; Saliency detection; Convolutional Neural Network(CNN); Vision transformer; Attention mechanism;
D O I
10.11999/JEIT220684
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Considering the problems of low detection accuracy, slow model convergence speed and large amount of computation in current panorama image saliency detection methods, a U-Net with Robust vision transformer and Multiple attention at tention modules (URMNet) is proposed. Sphere convolution is used to extract multi -scale features of panoramic images of the model,while reducing the distortion of panoramic images after equirectangular projection.The robust visual transformer module is used to extract the salient information contained in the feature maps of four scales, and the convolutional embedding is used to reduce the resolution of the feature maps and enhance the robustness of the model. The multiple attention module is used to integrate selectively multi-dimensional attention according to the relationship between spatial attention and channel attention. Finally, the multi-layer features are gradually fused to form a panoramic image saliency map. The latitude weighted loss function is used to make the model in this paper have a faster convergence rate. Experiments on two public datasets show that the model proposed in this paper outperforms other 6 advanced methods due to the use of a robust visual transformer module and a multiple attention module, and can further improve the saliency detection accuracy of panoramic images.
引用
收藏
页码:2246 / 2255
页数:10
相关论文
共 20 条
  • [1] Anzhou Wen, 2020, Journal of Physics: Conference Series, V1650, DOI 10.1088/1742-6596/1650/3/032113
  • [2] A Multi-FoV Viewport-Based Visual Saliency Model Using Adaptive Weighting Losses for 360° Images
    Chao, Fang-Yi
    Zhang, Lu
    Hamidouche, Wassim
    Deforges, Olivier
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 : 1811 - 1826
  • [3] SphereNet: Learning Spherical Representations for Detection and Classification in Omnidirectional Images
    Coors, Benjamin
    Condurache, Alexandru Paul
    Geiger, Andreas
    [J]. COMPUTER VISION - ECCV 2018, PT IX, 2018, 11213 : 525 - 541
  • [4] Predicting Human Eye Fixations via an LSTM-Based Saliency Attentive Model
    Cornia, Marcella
    Baraldi, Lorenzo
    Serra, Giuseppe
    Cucchiara, Rita
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (10) : 5142 - 5154
  • [5] Dahou Yasser, 2021, Pattern Recognition. ICPR International Workshops and Challenges. Proceedings. Lecture Notes in Computer Science (LNCS 12663), P305, DOI 10.1007/978-3-030-68796-0_22
  • [6] Dai F, 2020, INT CONF ACOUST SPEE, P2558, DOI [10.1109/ICASSP40776.2020.9053888, 10.1109/icassp40776.2020.9053888]
  • [7] Dosovitskiy A., 2021, ICLR
  • [8] Gutiérrez J, 2018, INT WORK QUAL MULTIM, P171
  • [9] Hong L., 2021, 2021 IEEE INT C MULT, P1, DOI [DOI 10.1109/ICME51207.2021.9428427, 10.1109/ICME51207.2021.9428427]
  • [10] RGB-D Image Saliency Detection Based on Multi-modal Feature-fused Supervision
    Liu Zhengyi
    Duan Quntao
    Shi Song
    Zhao Peng
    [J]. JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY, 2020, 42 (04) : 997 - 1004