Smooth-IoU Loss for Bounding Box Regression in Visual Tracking

被引:0
|
作者
Li G. [1 ]
Zhao W. [1 ]
Liu P. [1 ]
Tang X.-L. [1 ]
机构
[1] Pattern Recognition and Intelligence System Research Center, Harbin Institute of Technology, Harbin
来源
Zidonghua Xuebao/Acta Automatica Sinica | 2023年 / 49卷 / 02期
基金
中国国家自然科学基金;
关键词
bounding box regression; Smooth-IoU loss; visual tracking; ℓ[!sub]n[!/sub] -norm loss;
D O I
10.16383/j.aas.c210525
中图分类号
学科分类号
摘要
The branch of bounding box regression is a critical module in visual object trackers, and its performance directly affects accuracy of a tracker. One of evaluation metrics used to measure accuracy is intersection over union (IoU). The IoU loss which was proposed to replace ℓn -norm loss for bounding box regression is increasingly popular. However, there are two inherent issues in IoU loss: One is that the parameters of bounding box can not be updated via gradient descent if the predicted box does not intersect with ground-truth box; the other is the gradient of the optimal IoU does not exist, so it is difficult to make the predicted box regressed to the IoU optimum. We reveal the explicit relationship among the parameters of IoU optimal bounding box in regression process, and point out that the size of a predicted box which makes IoU loss optimal is not unique when its center is in specific areas, increasing the uncertainty of bounding box regression. From the perspective of optimizing divergence between two distributions, we propose a smooth-IoU (SIoU) loss, which is a globally smooth (continuously differentiable) loss function with unique extremum. The smooth-IoU loss naturally implicates a specific optimal relationship among the parameters of bounding box, and its gradient over the global domain exists, making it easier to regress the predicted box to the extremal bounding box, and the unique extremum ensures that the parameters can be updated via gradient descent. In addition, the proposed smooth-IoU loss can be easily incorporated into existing trackers by replacing the IoU-based loss to train bounding box regression. Extensive experiments on visual tracking benchmarks including LaSOT, GOT-10k, TrackingNet, OTB2015, and VOT2018 demonstrate that smooth-IoU loss achieves state-of-the-art performance, confirming its effectiveness and efficiency. © 2023 Science Press. All rights reserved.
引用
收藏
页码:288 / 306
页数:18
相关论文
共 42 条
  • [11] He Y, Zhu C, Wang J, Savvides M, Zhang X., Bounding box regression with uncertainty for accurate object detection, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2888-2897, (2019)
  • [12] Law H, Deng J., Cornernet: Detecting objects as paired keypoints, Proceedings of the 15th European Conference on Computer Vision, pp. 765-781, (2018)
  • [13] Gidaris S, Komodakis N., Locnet: Improving localization accuracy for object detection, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 789-798, (2016)
  • [14] Zhou X, Koltun V, Krahenbuhl P., Tracking objects as points, Proceedings of the 16th European Conference on Computer Vision, pp. 474-490, (2020)
  • [15] Lin T Y, Goyal P, Girshick R, He K, Dollar P., Focal loss for dense object detection, IEEE Transactions on Pattern Analysis and Machine Intelligence, 42, 2, pp. 318-327, (2020)
  • [16] Held D, Thrun S, Savarese S., Learning to track at 100 FPS with deep regression networks, Proceedings of the 14th European Conference on Computer Vision, pp. 749-765, (2016)
  • [17] Bertinetto L, Valmadre J, Henriques J F, Vedaldi A, Torr H S P., Fully-convolutional siamese networks for object tracking, Proceedings of the European Conference on Computer Vision Workshops, pp. 850-865, (2016)
  • [18] Jiang B, Luo R, Mao J, Xiao T, Jiang Y., Acquisition of localization confidence for accurate object detection, Proceedings of the 15th European Conference on Computer Vision, pp. 816-832, (2018)
  • [19] Wang G, Luo C, Xiong Z, Zeng W., SPM-tracker: Series-parallel matching for real-time visual object tracking, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3643-3652, (2019)
  • [20] Xu Y, Wang Z, Li Z, Ye Y, Yu G., SiamFC++: Towards robust and accurate visual tracking with target estimation guidelines, Proceedings of the 34th AAAI Conference on Artificial Intelligence, pp. 12549-12556, (2020)