Smooth-IoU Loss for Bounding Box Regression in Visual Tracking

被引：0

作者：

Li G. ^{[1
]}

Zhao W. ^{[1
]}

Liu P. ^{[1
]}

Tang X.-L. ^{[1
]}

机构：

[1] Pattern Recognition and Intelligence System Research Center, Harbin Institute of Technology, Harbin

来源：

Zidonghua Xuebao/Acta Automatica Sinica | 2023年 / 49卷 / 02期

基金：

中国国家自然科学基金;

关键词：

bounding box regression; Smooth-IoU loss; visual tracking; ℓ[!sub]n[!/sub] -norm loss;

D O I：

10.16383/j.aas.c210525

中图分类号：

学科分类号：

摘要：

The branch of bounding box regression is a critical module in visual object trackers, and its performance directly affects accuracy of a tracker. One of evaluation metrics used to measure accuracy is intersection over union (IoU). The IoU loss which was proposed to replace ℓn -norm loss for bounding box regression is increasingly popular. However, there are two inherent issues in IoU loss: One is that the parameters of bounding box can not be updated via gradient descent if the predicted box does not intersect with ground-truth box; the other is the gradient of the optimal IoU does not exist, so it is difficult to make the predicted box regressed to the IoU optimum. We reveal the explicit relationship among the parameters of IoU optimal bounding box in regression process, and point out that the size of a predicted box which makes IoU loss optimal is not unique when its center is in specific areas, increasing the uncertainty of bounding box regression. From the perspective of optimizing divergence between two distributions, we propose a smooth-IoU (SIoU) loss, which is a globally smooth (continuously differentiable) loss function with unique extremum. The smooth-IoU loss naturally implicates a specific optimal relationship among the parameters of bounding box, and its gradient over the global domain exists, making it easier to regress the predicted box to the extremal bounding box, and the unique extremum ensures that the parameters can be updated via gradient descent. In addition, the proposed smooth-IoU loss can be easily incorporated into existing trackers by replacing the IoU-based loss to train bounding box regression. Extensive experiments on visual tracking benchmarks including LaSOT, GOT-10k, TrackingNet, OTB2015, and VOT2018 demonstrate that smooth-IoU loss achieves state-of-the-art performance, confirming its effectiveness and efficiency. © 2023 Science Press. All rights reserved.

引用

页码：288 / 306

页数：18

共 42 条

[31] Muller M, Bibi A, Giancola S, Al-Subaihi S, Ghanem B., Trackingnet: A large-scale dataset and benchmark for object tracking in the wild, Proceedings of the 15th European Conference on Computer Vision, pp. 310-327, (2018)
[32] Huang L, Zhao X, Huang K., GOT-10k: A large high-diversity benchmark for generic object tracking in the wild, IEEE Transactions on Pattern Analysis and Machine Intelligence, 43, 5, pp. 1562-1577, (2021)
[33] Wu Y, Lim J, Yang M., Object tracking benchmark, IEEE Transactions on Pattern Analysis and Machine Intelligence, 37, 9, pp. 1834-1848, (2015)
[34] Kristan M, He Z., The sixth visual object tracking VOT2018 challenge results, Proceedings of the European Conference on Computer Vision Workshops, pp. 3-53, (2018)
[35] Wang Q, Zhang L, Bertinetto L, Hu W, Torr P H S., Fast online object tracking and segmentation: A unifying approach, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1328-1338, (2019)
[36] Huang L, Zhao X, Huang K., Globaltrack: A simple and strong baseline for long-term tracking, Proceedings of the 34th AAAI Conference on Artificial Intelligence, pp. 11037-11044, (2020)
[37] Fan H, Ling H., Siamese cascaded region proposal networks for real-time visual tracking, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7952-7961, (2019)
[38] Danelljan M, Bhat G, Khan F S, Felsberg M., ATOM: Accurate tracking by overlap maximization, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4660-4669, (2019)
[39] Bhat G, Danelljan M, Gool L V, Timofte R., Learning discriminative model prediction for tracking, Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6181-6190, (2019)
[40] Nam H, Han B., Learning multi-domain convolutional neural networks for visual tracking, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4293-4302, (2016)

← 1 2 3 4 5 →