Target Detection Method for Remote Sensing Images Based on Sparse Mask Transformer

被引：0

作者：

Liu Xulun ^{[1
]}

Ma Shiping ^{[1
]}

He Linyuan ^{[1
,2
]}

Wang Chen ^{[1
]}

He Xu ^{[1
]}

Chen Zhe ^{[3
]}

机构：

[1] Air Force Engn Univ, Sch Aeronaut Engn, Xian 710038, Shaanxi, Peoples R China

[2] Northwestern Polytech Univ, Unbanned Syst Res Inst, Xian 710072, Shaanxi, Peoples R China

[3] Xian Univ Posts & Telecommun, Sch Cyberspace Secur, Xian 710121, Shaanxi, Peoples R China

来源：

LASER & OPTOELECTRONICS PROGRESS | 2022年 / 59卷 / 22期

关键词：

Transformer; rotating object detection; self-attention; sparse mask;

D O I：

10.3788/LOP202259.2228005

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Addressing the challenge of low detection accuracy due to large differences in target scale and random direction distribution in remote sensing images, this study proposes a remote sensing object detection method based on a sparse mask Transformer. This approach is based on a Transformer network. First, the angle parameter is added to the Transformer network for realizing appropriate rotational characteristics of remote sensing targets. Then, in the feature extraction section, the multi-level feature pyramid is employed as an input to deal with the large variations of the remote sensing image targets' size and enhance the detection impact for targets with various scales, particularly for small targets. Finally, the self-attention module is replaced with a sparse-interpolation attention module, which efficiently reduces the error due to the large computation amount of Transformer network detecting high-resolution images, and accelerates the network convergence speed during the training phase. The detection findings on the large-scale remote sensing dataset DOTA reveal that the proposed method's average detection accuracy is 78. 43% and the detection speed is 12.5 frame/s. Compared to the traditional methods, the proposed method's mean average precision (mAP) is improved by 3.07 percentage points, which shows the proposed method's effectiveness.

引用

页数：8

共 25 条

[1] Carion Nicolas, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12346), P213, DOI 10.1007/978-3-030-58452-8_13
[2] Learning Rotation-Invariant and Fisher Discriminative Convolutional Neural Networks for Object Detection
Cheng, Gong
Han, Junwei
Zhou, Peicheng
Xu, Dong
[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (01) : 265 - 278
[3] Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[4] Ding J, 2018, Arxiv, DOI arXiv:1812.00155
[5] Goodfellow I, 2016, ADAPT COMPUT MACH LE, P1
[6] Graves A, 2012, STUD COMPUT INTELL, V385, P1, DOI [10.1007/978-3-642-24797-2, 10.1162/neco.1997.9.1.1]
[7] Deep Residual Learning for Image Recognition
He, Kaiming
Zhang, Xiangyu
Ren, Shaoqing
Sun, Jian
[J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 770 - 778
[8] Jang E, 2017, Arxiv, DOI [arXiv:1611.01144, DOI 10.48550/ARXIV.1611.01144]
[9] Jiang YY, 2017, Arxiv, DOI arXiv:1706.09579
[10] Kingma D P., 2014, P INT C LEARN REPR

← 1 2 3 →