Optical remote sensing image object detection based on multi-resolution feature fusion

被引:0
|
作者
Yao Y. [1 ,2 ]
Cheng G. [1 ,2 ]
Xie X. [1 ,2 ]
Han J. [2 ]
机构
[1] Research & Development Institute of Northwestern Polytechnical University in Shenzhen, Shenzhen
[2] School of Automation, Northwestern Polytechnical University, Xi'an
基金
中国国家自然科学基金;
关键词
Convolutional neural networks; Multi-resolution feature fusion; Object detection; Remote sensing images;
D O I
10.11834/jrs.20210505
中图分类号
学科分类号
摘要
In recent years, high-resolution remote sensing image object detection has attracted increasing interest and become an important research field of computer vision due to its wide applications in civil and military fields, such as environmental monitoring, urban planning, precision agriculture, and land mapping. The natural scene object detection frameworks based on deep learning have made a breakthrough progress. These algorithms have good detection performance on the open data sets of natural scenes. However, although these algorithms have greatly improved the accuracy and speed of remote sensing image object detection, they have not achieved the expected results. Given the large variations of object sizes and inter-class similarity, most of the conventional object detection algorithms designed for natural scene images still face some challenges when directly applied to remote sensing images. To address the above challenges, we propose an end-to-end multi-resolution feature fusion framework for object detection in remote sensing images, which can effectively improve the object detection accuracy. Specifically, we use a Feature Pyramid Network (FPN) to extract multi-scale feature maps. Then, a Multi-resolution Feature Extract (MFE) module, which can promote the network to learn the feature representations of the objects at different resolutions and narrow the semantic gap between different scales, is inserted into the feature layers of different scales. Next, to achieve an effective fusion of multi-resolution features, we use an Adaptive Feature Fusion (AFF) module to obtain more discriminative multi-resolution feature representations. Finally, we use a Dual-scale Feature Deep Fusion (DFDF) module to fuse two adjacent-scale features, which are the output of the adaptive feature fusion module. In the experiments, to demonstrate the effectiveness of each module of our proposed method, including the MFE, AFF, and DFDF modules, we first conducted extensive ablation studies on the large-scale remote sensing image data set DIOR, and the experimental results show that our proposed MFE, AFF, and DFDF modules could improve the average detection accuracy by 1.4%, 0.5%, and 1.3%, respectively, compared with the baseline method. Furthermore, we evaluate our method on two publicly available remote sensing image object detection data sets, namely, DIOR and DOTA, and obtain improvements of 2.5% and 2.2%, respectively, which are measured in terms of mAP comparison with Faster R-CNN with FPN. The detection results of the ablation studies and the comparison experiments indicate that our method can extract more discriminative and powerful feature representations than Faster R-CNN with FPN, which can significantly boost the detection accuracy. Moreover, our method works well for densely arranged and multi-scale objects. Although many improvements have been achieved in this work, some aspects still require improvement. For example, our method performs poorly in terms of detecting objects with big aspect-ratios, such as bridges, possibly because most anchor-based methods have difficulty ensuring a sufficiently high intersection over union rate with the ground-truth objects with big aspect-ratios. Our future work will focus on addressing these problems by exploring the advantages of anchor-free based methods. © 2021, Science Press. All right reserved.
引用
收藏
页码:1124 / 1137
页数:13
相关论文
共 39 条
  • [1] Azimi S M, Vig E, Bahmanyar R, Korner M, Reinartz P., Towards multi-class object detection in unconstrained remote sensing imagery, 14th Asian Conference on Computer Vision, pp. 150-165, (2019)
  • [2] Cao Q, Ma A L, Zhong Y F, Zhao J, Zhao B, Zhang L P., Urban classification by multi-feature fusion of hyperspectral image and LiDAR data, Journal of Remote Sensing, 23, 5, pp. 892-903, (2019)
  • [3] Chen C Y, Gong W G, Chen Y L, Li W H., Object detection in remote sensing images based on a scene-contextual feature pyramid network, Remote Sensing, 11, 3, (2019)
  • [4] Chen K Q, Gao X, Yan M L, Zhang Y, Sun X., Building extraction in pixel level from aerial imagery with a deep encoder-decoder network, Journal of Remote Sensing, 24, 9, pp. 1134-1142, (2020)
  • [5] Cheng G, Han J W., A survey on object detection in optical remote sensing images, ISPRS Journal of Photogrammetry and Remote Sensing, 117, pp. 11-28, (2016)
  • [6] Cheng G, Han J W, Zhou P C, Xu D., Learning rotation-invariant and fisher discriminative convolutional neural networks for object detection, IEEE Transactions on Image Processing, 28, 1, pp. 265-278, (2019)
  • [7] Cheng G, Li Z P, Han J W, Yao X W, Guo L., Exploring hierarchical convolutional features for hyperspectral image classification, IEEE Transactions on Geoscience and Remote Sensing, 56, 11, pp. 6712-6722, (2018)
  • [8] Cheng G, Si Y, Hong H L, Yao X W, Guo L., Cross-scale feature fusion for object detection in optical remote sensing images, IEEE Geoscience and Remote Sensing Letters, 18, 3, pp. 431-435, (2020)
  • [9] Cheng G, Zhou P C, Han J W., Learning rotation-invariant convolutional neural networks for object detection in VHR optical remote sensing images, IEEE Transactions on Geoscience and Remote Sensing, 54, 12, pp. 7405-7415, (2016)
  • [10] Dai J F, Li Y, He K M, Sun J., R-FCN: object detection via region-based fully convolutional networks, Proceedings of the 30th International Conference on Neural Information Processing Systems, pp. 379-387, (2016)