Swin-Transformer-Enabled YOLOv5 with Attention Mechanism for Small Object Detection on Satellite Images

被引:118
|
作者
Gong, Hang [1 ]
Mu, Tingkui [1 ]
Li, Qiuxia [1 ]
Dai, Haishan [2 ]
Li, Chunlai [3 ]
He, Zhiping [3 ]
Wang, Wenjing [1 ]
Han, Feng [1 ]
Tuniyazi, Abudusalamu [1 ]
Li, Haoyang [1 ]
Lang, Xuechan [1 ]
Li, Zhiyuan [1 ]
Wang, Bin [1 ]
机构
[1] Xi An Jiao Tong Univ, Res Ctr Space Opt & Astron, Sch Phys, MOE Key Lab Nonequilibrium Synth & Modulat Conden, Xian 710049, Peoples R China
[2] Shanghai Acad Spaceflight Technol, Shanghai Inst Satellite Engn, Shanghai 201109, Peoples R China
[3] Chinese Acad Sci, Shanghai Inst Tech Phys, Shanghai 200083, Peoples R China
基金
中国国家自然科学基金;
关键词
satellite images; object detection; self-attention mechanism; Swin transformer; deep learning; CLASSIFICATION;
D O I
10.3390/rs14122861
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
Object detection has made tremendous progress in natural images over the last decade. However, the results are hardly satisfactory when the natural image object detection algorithm is directly applied to satellite images. This is due to the intrinsic differences in the scale and orientation of objects generated by the bird's-eye perspective of satellite photographs. Moreover, the background of satellite images is complex and the object area is small; as a result, small objects tend to be missing due to the challenge of feature extraction. Dense objects overlap and occlusion also affects the detection performance. Although the self-attention mechanism was introduced to detect small objects, the computational complexity increased with the image's resolution. We modified the general one-stage detector YOLOv5 to adapt the satellite images to resolve the above problems. First, new feature fusion layers and a prediction head are added from the shallow layer for small object detection for the first time because it can maximally preserve the feature information. Second, the original convolutional prediction heads are replaced with Swin Transformer Prediction Heads (SPHs) for the first time. SPH represents an advanced self-attention mechanism whose shifted window design can reduce the computational complexity to linearity. Finally, Normalization-based Attention Modules (NAMs) are integrated into YOLOv5 to improve attention performance in a normalized way. The improved YOLOv5 is termed SPH-YOLOv5. It is evaluated on the NWPU-VHR10 dataset and DOTA dataset, which are widely used for satellite image object detection evaluations. Compared with the basal YOLOv5, SPH-YOLOv5 improves the mean Average Precision (mAP) by 0.071 on the DOTA dataset.
引用
收藏
页数:17
相关论文
共 50 条
  • [21] Small Object Detection for Birds with Swin Transformer
    Huo, Da
    Kastner, Marc A.
    Liu, Tingwei
    Kawanishi, Yasutomo
    Hirayama, Takatsugu
    Komamizu, Takahiro
    Ide, Ichiro
    2023 18TH INTERNATIONAL CONFERENCE ON MACHINE VISION AND APPLICATIONS, MVA, 2023,
  • [22] Dense Small Object Detection Algorithm Based on Improved YOLOv5 in UAV Aerial Images
    Chen, Jiahui
    Wang, Xiaohong
    Computer Engineering and Applications, 2024, 60 (03) : 100 - 109
  • [23] Modified YOLOv5 for small target detection in aerial images
    Singh, Inderpreet
    Munjal, Geetika
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (18) : 53221 - 53242
  • [24] Improved YOLOv5 Object Detection Algorithm for Remote Sensing Images
    Yang, Chen
    She, Lu
    Yang, Lu
    Feng, Zixian
    Computer Engineering and Applications, 2023, 59 (15) : 76 - 86
  • [25] Modified YOLOv5 for small target detection in aerial images
    Inderpreet Singh
    Geetika Munjal
    Multimedia Tools and Applications, 2024, 83 : 53221 - 53242
  • [26] C3TB-YOLOv5: integrated YOLOv5 with transformer for object detection in high-resolution remote sensing images
    Wu, Qinggang
    Li, Yang
    Huang, Wei
    Chen, Qiqiang
    Wu, Yonglei
    INTERNATIONAL JOURNAL OF REMOTE SENSING, 2024, 45 (08) : 2622 - 2650
  • [27] Mitotic cell detection in histopathological images of neuroendocrine tumors using improved YOLOv5 by transformer mechanism
    Zehra Yücel
    Fuat Akal
    Pembe Oltulu
    Signal, Image and Video Processing, 2023, 17 : 4107 - 4114
  • [28] Mitotic cell detection in histopathological images of neuroendocrine tumors using improved YOLOv5 by transformer mechanism
    Yucel, Zehra
    Akal, Fuat
    Oltulu, Pembe
    SIGNAL IMAGE AND VIDEO PROCESSING, 2023, 17 (08) : 4107 - 4114
  • [29] Improved Lightweight YOLOv5 Using Attention Mechanism for Satellite Components Recognition
    Li, Cong
    Zhao, Gaopeng
    Gu, Dongqing
    Wang, Zebin
    IEEE SENSORS JOURNAL, 2023, 23 (01) : 514 - 526
  • [30] 基于Swin Transformer和YOLOv5的无纺布瑕疵检测
    刘佳玮
    曹江涛
    姬晓飞
    辽宁石油化工大学学报, 2024, 44 (03) : 80 - 88