Target Detection Method for Remote Sensing Images Based on Sparse Mask Transformer

被引:0
|
作者
Liu Xulun [1 ]
Ma Shiping [1 ]
He Linyuan [1 ,2 ]
Wang Chen [1 ]
He Xu [1 ]
Chen Zhe [3 ]
机构
[1] Air Force Engn Univ, Sch Aeronaut Engn, Xian 710038, Shaanxi, Peoples R China
[2] Northwestern Polytech Univ, Unbanned Syst Res Inst, Xian 710072, Shaanxi, Peoples R China
[3] Xian Univ Posts & Telecommun, Sch Cyberspace Secur, Xian 710121, Shaanxi, Peoples R China
关键词
Transformer; rotating object detection; self-attention; sparse mask;
D O I
10.3788/LOP202259.2228005
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Addressing the challenge of low detection accuracy due to large differences in target scale and random direction distribution in remote sensing images, this study proposes a remote sensing object detection method based on a sparse mask Transformer. This approach is based on a Transformer network. First, the angle parameter is added to the Transformer network for realizing appropriate rotational characteristics of remote sensing targets. Then, in the feature extraction section, the multi-level feature pyramid is employed as an input to deal with the large variations of the remote sensing image targets' size and enhance the detection impact for targets with various scales, particularly for small targets. Finally, the self-attention module is replaced with a sparse-interpolation attention module, which efficiently reduces the error due to the large computation amount of Transformer network detecting high-resolution images, and accelerates the network convergence speed during the training phase. The detection findings on the large-scale remote sensing dataset DOTA reveal that the proposed method's average detection accuracy is 78. 43% and the detection speed is 12.5 frame/s. Compared to the traditional methods, the proposed method's mean average precision (mAP) is improved by 3.07 percentage points, which shows the proposed method's effectiveness.
引用
收藏
页数:8
相关论文
共 25 条
  • [1] Carion Nicolas, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12346), P213, DOI 10.1007/978-3-030-58452-8_13
  • [2] Learning Rotation-Invariant and Fisher Discriminative Convolutional Neural Networks for Object Detection
    Cheng, Gong
    Han, Junwei
    Zhou, Peicheng
    Xu, Dong
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (01) : 265 - 278
  • [3] Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
  • [4] Ding J, 2018, Arxiv, DOI arXiv:1812.00155
  • [5] Goodfellow I, 2016, ADAPT COMPUT MACH LE, P1
  • [6] Graves A, 2012, STUD COMPUT INTELL, V385, P1, DOI [10.1007/978-3-642-24797-2, 10.1162/neco.1997.9.1.1]
  • [7] Deep Residual Learning for Image Recognition
    He, Kaiming
    Zhang, Xiangyu
    Ren, Shaoqing
    Sun, Jian
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 770 - 778
  • [8] Jang E, 2017, Arxiv, DOI [arXiv:1611.01144, DOI 10.48550/ARXIV.1611.01144]
  • [9] Jiang YY, 2017, Arxiv, DOI arXiv:1706.09579
  • [10] Kingma D P., 2014, P INT C LEARN REPR