ACSiamRPN: Adaptive Context Sampling for Visual Object Tracking

被引:4
|
作者
Qin, Xiaofei [1 ,2 ,3 ]
Zhang, Yipeng [4 ]
Chang, Hang [5 ]
Lu, Hao [6 ]
Zhang, Xuedian [1 ,2 ,3 ,7 ]
机构
[1] Univ Shanghai Sci & Technol, Sch Opt Elect & Comp Engn, Shanghai 200093, Peoples R China
[2] Shanghai Key Lab Contemporary Opt Syst, Shanghai 200093, Peoples R China
[3] Minist Educ, Key Lab Biomed Opt Technol & Devices, Shanghai 200093, Peoples R China
[4] Univ Shanghai Sci & Technol, Sch Mech Engn, Shanghai 200093, Peoples R China
[5] Lawrence Berkeley Natl Lab, Berkeley, CA 94720 USA
[6] Guangxi Yuchai Machinery Co Ltd, Nanning 530007, Peoples R China
[7] Tongji Univ, Shanghai Inst Intelligent Sci & Technol, Shanghai 200092, Peoples R China
关键词
visual object tracking; SiamRPN; global context; selective kernel convolution; SIAMESE NETWORKS;
D O I
10.3390/electronics9091528
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In visual object tracking fields, the Siamese network tracker, based on the region proposal network (SiamRPN), has achieved promising tracking effects, both in speed and accuracy. However, it did not consider the relationship and differences between the long-range context information of various objects. In this paper, we add a global context block (GC block), which is lightweight and can effectively model long-range dependency, to the Siamese network part of SiamRPN so that the object tracker can better understand the tracking scene. At the same time, we propose a novel convolution module, called a cropping-inside selective kernel block (CiSK block), based on selective kernel convolution (SK convolution, a module proposed in selective kernel networks) and use it in the region proposal network (RPN) part of SiamRPN, which can adaptively adjust the size of the receptive field for different types of objects. We make two improvements to SK convolution in the CiSK block. The first improvement is that in the fusion step of SK convolution, we use both global average pooling (GAP) and global maximum pooling (GMP) to enhance global information embedding. The second improvement is that after the selection step of SK convolution, we crop out the outermost pixels of features to reduce the impact of padding operations. The experiment results show that on the OTB100 benchmark, we achieved an accuracy of 0.857 and a success rate of 0.643. On the VOT2016 and VOT2019 benchmarks, we achieved expected average overlap (EAO) scores of 0.394 and 0.240, respectively.
引用
收藏
页码:1 / 13
页数:13
相关论文
共 50 条
  • [21] Using Discriminative Motion Context for Online Visual Object Tracking
    Duffner, Stefan
    Garcia, Christophe
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2016, 26 (12) : 2215 - 2225
  • [22] Multiple Context Features in Siamese Networks for Visual Object Tracking
    Morimitsu, Henrique
    COMPUTER VISION - ECCV 2018 WORKSHOPS, PT I, 2019, 11129 : 116 - 131
  • [23] Adaptive Discriminative Deep Correlation Filter for Visual Object Tracking
    Han, Zhenjun
    Wang, Pan
    Ye, Qixiang
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2020, 30 (01) : 155 - 166
  • [24] Adaptive Dynamic Model Particle Filter for Visual Object Tracking
    Zhang, JiXiang
    Tian, Yuan
    Yang, YiPing
    2009 ISECS INTERNATIONAL COLLOQUIUM ON COMPUTING, COMMUNICATION, CONTROL, AND MANAGEMENT, VOL I, 2009, : 333 - 336
  • [25] Adaptive cascaded and parallel feature fusion for visual object tracking
    Wang, Jun
    Li, Sixuan
    Li, Kunlun
    Zhu, Qizhen
    VISUAL COMPUTER, 2024, 40 (03): : 2119 - 2138
  • [26] Scale Adaptive Dense Structural Learning for Visual Object Tracking
    Yu, Xianguo
    Yu, Qifeng
    Zhang, Hongliang
    PROCEEDINGS OF 2018 10TH INTERNATIONAL CONFERENCE ON COMPUTER AND AUTOMATION ENGINEERING (ICCAE 2018), 2018, : 87 - 91
  • [27] Adaptive cascaded and parallel feature fusion for visual object tracking
    Jun Wang
    Sixuan Li
    Kunlun Li
    Qizhen Zhu
    The Visual Computer, 2024, 40 : 2119 - 2138
  • [28] Adaptive object tracking algorithm based on eigenbasis space and compressive sampling
    Li, J.
    Wang, J.
    IET IMAGE PROCESSING, 2012, 6 (08) : 1170 - 1180
  • [29] Adaptive spatio-temporal context learning for visual tracking
    Zhang, Yaqin
    Wang, Liejun
    Qin, Jiwei
    IMAGING SCIENCE JOURNAL, 2019, 67 (03): : 136 - 147
  • [30] Figure/ground modeling combined with the context matching for visual object tracking
    Bordbar, Saghar
    Agahi, Hamed
    Mahmoodzadeh, Azar
    INTELLIGENT DECISION TECHNOLOGIES-NETHERLANDS, 2019, 13 (03): : 355 - 366