Exploring Spatial-Temporal Instance Relationships in an Intermediate Domain for Image-to-Video Object Detection

被引:0
|
作者
Wen, Zihan [1 ]
Chen, Jin [1 ]
Wu, Xinxiao [1 ]
机构
[1] Beijing Inst Technol, Sch Comp Sci, Beijing Lab Intelligent Informat Technol, Beijing, Peoples R China
来源
关键词
Deep learning; Object detection; Domain adaptation;
D O I
10.1007/978-3-031-27066-6_25
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Image-to-video object detection leverages annotated images to help detect objects in unannotated videos, so as to break the heavy dependency on the expensive annotation of large-scale video frames. This task is extremely challenging due to the serious domain discrepancy between images and video frames caused by appearance variance and motion blur. Previous methods perform both image-level and instance-level alignments to reduce the domain discrepancy, but the existing false instance alignments may limit their performance in real scenarios. We propose a novel spatial-temporal graph to model the contextual relationships between instances to alleviate the false alignments. Through message propagation over the graph, the visual information from the spatial and temporal neighboring object proposals are adaptively aggregated to enhance the current instance representation. Moreover, to adapt the source-biased decision boundary to the target data, we generate an intermediate domain between images and frames. It is worth mentioning that our method can be easily applied as a plug-and-play component to other image-to-video object detection models based on the instance alignment. Experiments on several datasets demonstrate the effectiveness of our method. Code will be available at: https://github.com/wenzihan/STMP.
引用
收藏
页码:360 / 375
页数:16
相关论文
共 50 条
  • [1] Spatial-temporal Causal Inference for Partial Image-to-video Adaptation
    Chen, Jin
    Wu, Xinxiao
    Hu, Yao
    Luo, Jiebo
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 1027 - 1035
  • [2] Video Object Detection with an Aligned Spatial-Temporal Memory
    Xiao, Fanyi
    Lee, Yong Jae
    COMPUTER VISION - ECCV 2018, PT VIII, 2018, 11212 : 494 - 510
  • [3] SPATIAL-TEMPORAL FEATURE AGGREGATION NETWORK FOR VIDEO OBJECT DETECTION
    Chen, Zhu
    Li, Weihai
    Fei, Chi
    Liu, Bin
    Yu, Nenghai
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 1858 - 1862
  • [4] Multilevel Spatial-Temporal Feature Aggregation for Video Object Detection
    Xu, Chao
    Zhang, Jiangning
    Wang, Mengmeng
    Tian, Guanzhong
    Liu, Yong
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (11) : 7809 - 7820
  • [5] Deep Spatial-Temporal Joint Feature Representation for Video Object Detection
    Zhao, Baojun
    Zhao, Boya
    Tang, Linbo
    Han, Yuqi
    Wang, Wenzheng
    SENSORS, 2018, 18 (03)
  • [6] Object Detection-Based Video Retargeting With Spatial-Temporal Consistency
    Lee, Seung Joon
    Lee, Siyeong
    Cho, Sung In
    Kang, Suk-Ju
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2020, 30 (12) : 4434 - 4439
  • [7] Learning Complementary Spatial-Temporal Transformer for Video Salient Object Detection
    Liu, Nian
    Nan, Kepan
    Zhao, Wangbo
    Yao, Xiwen
    Han, Junwei
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (08) : 10663 - 10673
  • [8] End-to-End Video Object Detection with Spatial-Temporal Transformers
    He, Lu
    Zhou, Qianyu
    Li, Xiangtai
    Niu, Li
    Cheng, Guangliang
    Li, Xiao
    Liu, Wenxuan
    Tong, Yunhai
    Ma, Lizhuang
    Zhang, Liqing
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 1507 - 1516
  • [9] Exploring spatial-temporal features fusion model for Deepfake video detection
    Wu, Jiujiu
    Zhou, Jiyu
    Wang, Danyu
    Wang, Lin
    JOURNAL OF ELECTRONIC IMAGING, 2023, 32 (06)
  • [10] SALIENT OBJECT DETECTION IN IMAGE SEQUENCES VIA SPATIAL-TEMPORAL CUE
    Gan, Chuang
    Qin, Zengchang
    Xu, Jia
    Wan, Tao
    2013 IEEE INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (IEEE VCIP 2013), 2013,