Enhanced Multi-scale Target Detection Method Based on YOLOv5

被引:0
|
作者
Hui K. [1 ]
Yang W. [1 ]
Liu H. [1 ]
Zhang Z. [1 ]
Zheng J. [2 ]
Bai X. [2 ]
机构
[1] College of Computer Science & Technology, Civil Aviation University of China, Tianjin
[2] School of Computer Science and Technology, Beihang University, Beijing
来源
Binggong Xuebao/Acta Armamentarii | 2023年 / 44卷 / 09期
关键词
clustering algorithm; feature fusion; multi-scale convolution; target detection; YOLOv5; model;
D O I
10.12382/bgxb.2022.1147
中图分类号
学科分类号
摘要
To address the problem that the initial anchor box is difficult to match the target and its multiscale detection ability is not strong in complex scenes, an enhanced multi-scale target detection method based on YOLOv5 is proposed. Through the Kmeans + + clustering algorithm, the multi-scale initialization anchors suitable for the current detection scene is obtained, which makes it easier for the network to capture targets with different scales; then, a number of parallel convolution branches with different scales are added to the Bottleneck structure. While retaining the original feature information, the multi-scale feature information is fused to enhance the global perception ability of the model. The EMYOLOv5s model proposed is tested on VisDrone2019, COCO2017, and PASCAL VOC2012 datasets. The experimental results show that: compared with the YOLOv5s model, the key indicators such as mAP@ 0. 5: 0. 95 and mAP @ 0. 5 are improved; on PASCAL VOC2012, mAP @ 0. 5: 0. 95 is increased by 5. 2%, while the detection time is only increased by 1. 9 ms, indicating that EM-YOLOv5 model can effectively improve the target detection accuracy in general complex scenes. © 2023 China Ordnance Society. All rights reserved.
引用
收藏
页码:2600 / 2610
页数:10
相关论文
共 34 条
  • [1] HE K M, ZHANG X Y, REN S Q, Et al., Spatial pyramid pooling in deep convolutional networks for visual recognition [ J], IEEE Transactions on Pattern Analysis and Machine Intelligence, 37, 9, pp. 1904-1916, (2015)
  • [2] GIRSHICK R., Fast R-CNN, Proceeding of the 2015 IEEE Conference on Computer Vision, pp. 1440-1448, (2015)
  • [3] REN S Q, HE K M, GIRSHICK R, Et al., Faster R-CNN: towards real-time object detection with region proposal networks [ J ], IEEE Transactions on Pattern Analysis & Machine Intelligence, 39, 6, pp. 1137-1149, (2017)
  • [4] REDMON J, DIVVALA S, GIRSHICK R, Et al., You only look once: unified, real-time object detection, Proceeding of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, pp. 779-788, (2016)
  • [5] LIN T Y, GOYAL P, GIRSHICK R, Et al., Focal loss for dense object detection[J], IEEE Transactions of Pattern Analysis and Machine Intelligerce, 42, 2, pp. 318-327, (2020)
  • [6] LAW H, DENG J., Cornernet: detecting objects as paired keypoints, Proceedings of the European Conference on Computer Vision, pp. 765-781, (2018)
  • [7] TIAN Z, SHEN C H, CHEN H, Et al., Fcos: fully convolutional one-stage object detection [ C], Proceedings of the 2019 IEEE International Conference on Computer Vision, pp. 9627-9636, (2019)
  • [8] LIU Z, LIN Y T, CAO Y, Et al., Swin transformer: hierarchical vision transformer using shifted windows, Proceedings of the IEEE International Conference on Computer Vision, pp. 9992-10002, (2021)
  • [9] YANG Z D, LI Z, JIANG X, Et al., Focal and global knowledge distillation for detectors [ C ], Proceedings of the 2022 IEEE Conference on Computer Vision and Pattern Recognition, pp. 4633-4642, (2022)
  • [10] LIU W, ANGUELOV D, ERHAN D, Et al., SSD: single shot multibox detector [ C ], Proceedings of the 14th European Conference on Computer Vision, pp. 21-37, (2016)