MINet: Meta-Learning Instance Identifiers for Video Object Detection

被引:13
|
作者
Deng, Jiajun [1 ]
Pan, Yingwei [2 ]
Yao, Ting [2 ]
Zhou, Wengang [1 ]
Li, Houqiang [1 ]
Mei, Tao [2 ]
机构
[1] Univ Sci & Technol China USTC, Dept Elect Engn & Informat Sci, Hefei 230026, Peoples R China
[2] JD AI Res, Beijing 100105, Peoples R China
基金
中国国家自然科学基金;
关键词
Object detection; Feature extraction; Detectors; Proposals; Optical imaging; Robustness; History; Video object detection; meta learning; memory network; box association; NETWORKS;
D O I
10.1109/TIP.2021.3099409
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent advances in video object detection have characterized the exploration of temporal coherence across frames to enhance object detector. Nevertheless, previous solutions either rely on additional inputs (e.g., optical flow) to guide feature aggregation, or complex post-processing to associate bounding boxes. In this paper, we introduce a simple but effective design that learns instance identifiers for instance association in a meta-learning paradigm, which requires no auxiliary inputs or post-processing. Specifically, we present Meta-Learnt Instance Identifier Networks (namely MINet) that novelly meta-learns instance identifiers to recognize identical instances across frames in a single forward-pass, leading to the robust online linking of instances. Technically, depending on the detection results of previous frames, we teach MINet to learn the weights of an instance identifier on the fly, which can be well applied to up-coming frames. Such meta-learning paradigm enables instance identifiers to be flexibly adapted to novel frames at inference. Furthermore, MINet writes/updates the detection results of previous instances into memory and reads from memory when performing inference to encourage temporal consistency for video object detection. Our MINet is appealing in the sense that it is pluggable to any object detection model. Extensive experiments on ImageNet VID dataset demonstrate the superiority of MINet. More remarkably, by integrating MINet into Faster R-CNN, we achieve 80.2% mAP on ImageNet VID dataset.
引用
收藏
页码:6879 / 6891
页数:13
相关论文
共 50 条
  • [21] Swin Transformer-Based Object Detection Model Using Explainable Meta-Learning Mining
    Baek, Ji-Won
    Chung, Kyungyong
    APPLIED SCIENCES-BASEL, 2023, 13 (05):
  • [22] Meta-learning few shot object detection algorithm based on channel and spatial attention mechanisms
    Jiang, Lianyuan
    Chen, Jinlong
    Yang, Minghao
    PROCEEDINGS OF 2023 7TH INTERNATIONAL CONFERENCE ON ELECTRONIC INFORMATION TECHNOLOGY AND COMPUTER ENGINEERING, EITCE 2023, 2023, : 897 - 903
  • [23] Multiple Instance Differentiation Learning for Active Object Detection
    Wan, Fang
    Ye, Qixiang
    Yuan, Tianning
    Xu, Songcen
    Liu, Jianzhuang
    Ji, Xiangyang
    Huang, Qingming
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (10) : 12133 - 12147
  • [24] Smooth Multi-instance Learning for Object Detection
    Li, Dayuan
    Li, Zhipeng
    Zhang, Youhua
    INTELLIGENT COMPUTING THEORIES AND APPLICATION, ICIC 2017, PT I, 2017, 10361 : 758 - 767
  • [25] Salient Object Detection via Multiple Instance Learning
    Huang, Fang
    Qi, Jinqing
    Lu, Huchuan
    Zhang, Lihe
    Ruan, Xiang
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2017, 26 (04) : 1911 - 1922
  • [26] Learn from Object Counting: Crowd Counting with Meta-learning
    Zan, Changtong
    Liu, Baodi
    Guan, Weili
    Zhang, Kai
    Liu, Weifeng
    IET IMAGE PROCESSING, 2021, 15 (14) : 3543 - 3550
  • [27] Sharp Multiple Instance Learning for DeepFake Video Detection
    Li, Xiaodan
    Lang, Yining
    Chen, Yuefeng
    Mao, Xiaofeng
    He, Yuan
    Wang, Shuhui
    Xue, Hui
    Lu, Quan
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 1864 - 1872
  • [28] Multiple Instance Relational Learning for Video Anomaly Detection
    Dengxiong, Xiwen
    Bao, Wentao
    Kong, Yu
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [29] Deep learning for object detection in video
    Lu, Shengyu
    2018 INTERNATIONAL SEMINAR ON COMPUTER SCIENCE AND ENGINEERING TECHNOLOGY (SCSET 2018), 2019, 1176
  • [30] Learning Meta-Learning (LML) dataset: Survey data of meta-learning parameters
    Corraya, Sonia
    Al Mamun, Shamim
    Kaiser, M. Shamim
    DATA IN BRIEF, 2023, 51