End-to-end learning interpolation for object tracking in low frame-rate video

被引:7
|
作者
Liu, Liqiang [1 ,2 ]
Cao, Jianzhong [1 ]
机构
[1] Chinese Acad Sci, Xian Inst Opt & Precis Mech, 17 Xinxi Rd, Xian, Peoples R China
[2] Univ Chinese Acad Sci, 19 Yuquan Rd, Beijing, Peoples R China
关键词
video signal processing; learning (artificial intelligence); object tracking; interpolation; mobile computing; low frame rates; implicit video frame interpolation sub-network; low frame-rate video; high frame-rate latent video; effective end-to-end optimisation; frame rate; tracking accuracy; semantic video analytics; end-to-end learning interpolation; subsequent semantic analytics; bandwidth constraints; analytics performance; MOTION ESTIMATION; SIAMESE NETWORKS;
D O I
10.1049/iet-ipr.2019.0944
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In many scenarios, where videos are transmitted through bandwidth-limited channels for subsequent semantic analytics, the choice of frame rates has to balance between bandwidth constraints and analytics performance. Faced with this practical challenge, this study focuses on enhancing object tracking at low frame rates and proposes a learning Interpolation for tracking framework. This framework embeds an implicit video frame interpolation sub-network, which is concatenated and jointly trained with another object tracking sub-network. Once a low frame-rate video is an input, it is first mapped into a high frame-rate latent video, based on which the tracker is learned. Novel strategies and loss functions are derived to ensure the effective end-to-end optimisation of the authors' network. On several challenging benchmarks and settings, their method achieves a highly competitive tradeoff between frame rate and tracking accuracy. As is known, the implications of interpolation on semantic video analytics and tracking remain unexplored, and the authors expect their method to find many applications in mobile embedded vision, Internet of Things and edge computing.
引用
收藏
页码:1066 / 1072
页数:7
相关论文
共 50 条
  • [41] MPNET: An End-to-End Deep Neural Network for Object Detection in Surveillance Video
    Wang, Hanyu
    Wang, Ping
    Qian, Xueming
    IEEE ACCESS, 2018, 6 : 30296 - 30308
  • [42] TransVOD: End-to-End Video Object Detection With Spatial-Temporal Transformers
    Zhou, Qianyu
    Li, Xiangtai
    He, Lu
    Yang, Yibo
    Cheng, Guangliang
    Tong, Yunhai
    Ma, Lizhuang
    Tao, Dacheng
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (06) : 7853 - 7869
  • [43] Error resilience video coding parameters and mechanisms selection with End-to-End rate-distortion analysis at frame level
    Weiwei Xu
    Yaowu Chen
    Multimedia Tools and Applications, 2016, 75 : 2347 - 2366
  • [44] Error resilience video coding parameters and mechanisms selection with End-to-End rate-distortion analysis at frame level
    Xu, Weiwei
    Chen, Yaowu
    MULTIMEDIA TOOLS AND APPLICATIONS, 2016, 75 (04) : 2347 - 2366
  • [45] Adaptive Frame Interpolation using an End-to-End Deep Net with High Quality Flow Estimation
    Tseng, Ren-Yu
    Liu, Yao-Kai
    Chen, Ju-Chin
    Lin, Kawuu W.
    2019 INTERNATIONAL CONFERENCE ON TECHNOLOGIES AND APPLICATIONS OF ARTIFICIAL INTELLIGENCE (TAAI), 2019,
  • [46] AffordanceNet: An End-to-End Deep Learning Approach for Object Affordance Detection
    Thanh-Toan Do
    Anh Nguyen
    Reid, Ian
    2018 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2018, : 5882 - 5889
  • [47] Saliency Guided End-to-End Learning for Weakly Supervised Object Detection
    Lai, Baisheng
    Gong, Xiaojin
    PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 2053 - 2059
  • [48] End-to-End Object Detection with YOLOF
    Xi, Xing
    Huang, Yangyang
    Wu, Weiye
    Luo, Ronghua
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT VII, ICIC 2024, 2024, 14868 : 101 - 112
  • [49] Review and Evaluation of End-to-End Video Compression with Deep-Learning
    Yasin, Hajar Maseeh
    Ameen, Siddeeq Yosef
    2021 INTERNATIONAL CONFERENCE OF MODERN TRENDS IN INFORMATION AND COMMUNICATION TECHNOLOGY INDUSTRY (MTICTI 2021), 2021, : 81 - 88
  • [50] End-to-End Semi-Supervised Learning for Video Action Detection
    Kumar, Akash
    Rawat, Yogesh Singh
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 14680 - 14690