Saliency Detection of Panoramic Images Based on Robust Vision Transformer and Multiple Attention

被引：0

作者：

Chen, Xiaolei ^{[1
]}

Zhang, Pengcheng

Lu, Yubing

Cao, Baoning

机构：

[1] Lanzhou Univ Technol, Sch Elect Engn & Informat Engn, Lanzhou 730050, Peoples R China

来源：

JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY | 2023年 / 45卷 / 06期

基金：

中国国家自然科学基金;

关键词：

Panoramic image; Saliency detection; Convolutional Neural Network(CNN); Vision transformer; Attention mechanism;

D O I：

10.11999/JEIT220684

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Considering the problems of low detection accuracy, slow model convergence speed and large amount of computation in current panorama image saliency detection methods, a U-Net with Robust vision transformer and Multiple attention at tention modules (URMNet) is proposed. Sphere convolution is used to extract multi -scale features of panoramic images of the model,while reducing the distortion of panoramic images after equirectangular projection.The robust visual transformer module is used to extract the salient information contained in the feature maps of four scales, and the convolutional embedding is used to reduce the resolution of the feature maps and enhance the robustness of the model. The multiple attention module is used to integrate selectively multi-dimensional attention according to the relationship between spatial attention and channel attention. Finally, the multi-layer features are gradually fused to form a panoramic image saliency map. The latitude weighted loss function is used to make the model in this paper have a faster convergence rate. Experiments on two public datasets show that the model proposed in this paper outperforms other 6 advanced methods due to the use of a robust visual transformer module and a multiple attention module, and can further improve the saliency detection accuracy of panoramic images.

引用

页码：2246 / 2255

页数：10

共 20 条

[1] Anzhou Wen, 2020, Journal of Physics: Conference Series, V1650, DOI 10.1088/1742-6596/1650/3/032113
[2] A Multi-FoV Viewport-Based Visual Saliency Model Using Adaptive Weighting Losses for 360° Images
Chao, Fang-Yi
Zhang, Lu
Hamidouche, Wassim
Deforges, Olivier
[J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 : 1811 - 1826
[3] SphereNet: Learning Spherical Representations for Detection and Classification in Omnidirectional Images
Coors, Benjamin
Condurache, Alexandru Paul
Geiger, Andreas
[J]. COMPUTER VISION - ECCV 2018, PT IX, 2018, 11213 : 525 - 541
[4] Predicting Human Eye Fixations via an LSTM-Based Saliency Attentive Model
Cornia, Marcella
Baraldi, Lorenzo
Serra, Giuseppe
Cucchiara, Rita
[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (10) : 5142 - 5154
[5] Dahou Yasser, 2021, Pattern Recognition. ICPR International Workshops and Challenges. Proceedings. Lecture Notes in Computer Science (LNCS 12663), P305, DOI 10.1007/978-3-030-68796-0_22
[6] Dai F, 2020, INT CONF ACOUST SPEE, P2558, DOI [10.1109/ICASSP40776.2020.9053888, 10.1109/icassp40776.2020.9053888]
[7] Dosovitskiy A., 2021, ICLR
[8] Gutiérrez J, 2018, INT WORK QUAL MULTIM, P171
[9] Hong L., 2021, 2021 IEEE INT C MULT, P1, DOI [DOI 10.1109/ICME51207.2021.9428427, 10.1109/ICME51207.2021.9428427]
[10] RGB-D Image Saliency Detection Based on Multi-modal Feature-fused Supervision
Liu Zhengyi
Duan Quntao
Shi Song
Zhao Peng
[J]. JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY, 2020, 42 (04) : 997 - 1004

← 1 2 →