Smooth-IoU Loss for Bounding Box Regression in Visual Tracking

被引:0
|
作者
Li G. [1 ]
Zhao W. [1 ]
Liu P. [1 ]
Tang X.-L. [1 ]
机构
[1] Pattern Recognition and Intelligence System Research Center, Harbin Institute of Technology, Harbin
来源
基金
中国国家自然科学基金;
关键词
bounding box regression; Smooth-IoU loss; visual tracking; ℓ[!sub]n[!/sub] -norm loss;
D O I
10.16383/j.aas.c210525
中图分类号
学科分类号
摘要
The branch of bounding box regression is a critical module in visual object trackers, and its performance directly affects accuracy of a tracker. One of evaluation metrics used to measure accuracy is intersection over union (IoU). The IoU loss which was proposed to replace ℓn -norm loss for bounding box regression is increasingly popular. However, there are two inherent issues in IoU loss: One is that the parameters of bounding box can not be updated via gradient descent if the predicted box does not intersect with ground-truth box; the other is the gradient of the optimal IoU does not exist, so it is difficult to make the predicted box regressed to the IoU optimum. We reveal the explicit relationship among the parameters of IoU optimal bounding box in regression process, and point out that the size of a predicted box which makes IoU loss optimal is not unique when its center is in specific areas, increasing the uncertainty of bounding box regression. From the perspective of optimizing divergence between two distributions, we propose a smooth-IoU (SIoU) loss, which is a globally smooth (continuously differentiable) loss function with unique extremum. The smooth-IoU loss naturally implicates a specific optimal relationship among the parameters of bounding box, and its gradient over the global domain exists, making it easier to regress the predicted box to the extremal bounding box, and the unique extremum ensures that the parameters can be updated via gradient descent. In addition, the proposed smooth-IoU loss can be easily incorporated into existing trackers by replacing the IoU-based loss to train bounding box regression. Extensive experiments on visual tracking benchmarks including LaSOT, GOT-10k, TrackingNet, OTB2015, and VOT2018 demonstrate that smooth-IoU loss achieves state-of-the-art performance, confirming its effectiveness and efficiency. © 2023 Science Press. All rights reserved.
引用
收藏
页码:288 / 306
页数:18
相关论文
共 42 条
  • [1] Meng Lu, Yang Xu, A survey of object tracking algorithms, Acta Automatica Sinica, 45, 7, pp. 1244-1260, (2019)
  • [2] Jiang Hong-Yi, Wang Yong-Juan, Kang Jin-Yu, A survey of object detection models and its optimization methods, Acta Automatica Sinica, 47, 6, pp. 1232-1255, (2021)
  • [3] Girshick R B., Fast R-CNN, Proceedings of the IEEE International Conference on Computer Vision, pp. 1440-1448, (2015)
  • [4] Yu J, Jiang Y, Wang Z, Cao Z, Huang T S., Unitbox: An advanced object detection network, Proceedings of the ACM Conference on Multimedia Conference, pp. 516-520, (2016)
  • [5] Rezatofighi H, Tsoi N, Gwak J, Sadeghian A, Reid I D, Savarese S., Generalized intersection over union: A metric and a loss for bounding box regression, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 658-666, (2019)
  • [6] Zheng Z, Wang P, Li J, Ye R, Ren D., Distance-IOU loss: Faster and better learning for bounding box regression, Proceedings of the 34th AAAI Conference on Artificial Intelligence, pp. 12993-13000, (2020)
  • [7] Zhang Y, Ren W, Zhang Z, Jia Z, Wang L, Tan T., Focal and efficient IOU loss for accurate bounding box regression
  • [8] Li B, Yan J, Wu W, Zhu Z, Hu X., High performance visual tracking with siamese region proposal network, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8971-8980, (2018)
  • [9] Li B, Wu W, Wang Q, Wu W, Yan J, Hu W., SiamRPN++: Evolution of siamese visual tracking with very deep networks, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4282-4291, (2019)
  • [10] Zhu Z, Wang Q, Li B, Wu W, Yan J, Hu W., Distractor-aware siamese networks for visual object tracking, Proceedings of the 15th European Conference on Computer Vision, pp. 103-119, (2018)