Exploring Spatial-Temporal Instance Relationships in an Intermediate Domain for Image-to-Video Object Detection

被引:0
|
作者
Wen, Zihan [1 ]
Chen, Jin [1 ]
Wu, Xinxiao [1 ]
机构
[1] Beijing Inst Technol, Sch Comp Sci, Beijing Lab Intelligent Informat Technol, Beijing, Peoples R China
来源
关键词
Deep learning; Object detection; Domain adaptation;
D O I
10.1007/978-3-031-27066-6_25
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Image-to-video object detection leverages annotated images to help detect objects in unannotated videos, so as to break the heavy dependency on the expensive annotation of large-scale video frames. This task is extremely challenging due to the serious domain discrepancy between images and video frames caused by appearance variance and motion blur. Previous methods perform both image-level and instance-level alignments to reduce the domain discrepancy, but the existing false instance alignments may limit their performance in real scenarios. We propose a novel spatial-temporal graph to model the contextual relationships between instances to alleviate the false alignments. Through message propagation over the graph, the visual information from the spatial and temporal neighboring object proposals are adaptively aggregated to enhance the current instance representation. Moreover, to adapt the source-biased decision boundary to the target data, we generate an intermediate domain between images and frames. It is worth mentioning that our method can be easily applied as a plug-and-play component to other image-to-video object detection models based on the instance alignment. Experiments on several datasets demonstrate the effectiveness of our method. Code will be available at: https://github.com/wenzihan/STMP.
引用
收藏
页码:360 / 375
页数:16
相关论文
共 50 条
  • [31] Plane Wave Image Formation in Spatial-Temporal Frequency Domain
    Liu, D-L Donald
    Ji, Ting-Lan
    2016 IEEE INTERNATIONAL ULTRASONICS SYMPOSIUM (IUS), 2016,
  • [32] Self-supervised spatial-temporal feature enhancement for one-shot video object detection
    Yao, Xudong
    Yang, Xiaoshan
    NEUROCOMPUTING, 2024, 601
  • [33] ISTVT: Interpretable Spatial-Temporal Video Transformer for Deepfake Detection
    Zhao, Cairong
    Wang, Chutian
    Hu, Guosheng
    Chen, Haonan
    Liu, Chun
    Tang, Jinhui
    IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2023, 18 : 1335 - 1348
  • [34] Spatial-Temporal Structural and Dynamics Features for Video Fire Detection
    Wang, Hongcheng
    Finn, Alan
    Erdinc, Ozgur
    Vincitore, Antonio
    2013 IEEE WORKSHOP ON APPLICATIONS OF COMPUTER VISION (WACV), 2013, : 513 - 519
  • [35] Slow Video Detection Based on Spatial-Temporal Feature Representation
    Ma, Jianyu
    Yao, Haichao
    Ni, Rongrong
    Zhao, Yao
    PATTERN RECOGNITION AND COMPUTER VISION,, PT III, 2021, 13021 : 298 - 309
  • [36] Spatial-temporal graph attention network for video anomaly detection
    Chen, Haoyang
    Mei, Xue
    Ma, Zhiyuan
    Wu, Xinhong
    Wei, Yachuan
    IMAGE AND VISION COMPUTING, 2023, 131
  • [37] An Efficient Spatial-Temporal Polyp Detection Framework for Colonoscopy Video
    Zhang, Pengfei
    Sun, Xinzi
    Wang, Dechun
    Wang, Xizhe
    Cao, Yu
    Liu, Benyuan
    2019 IEEE 31ST INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2019), 2019, : 1252 - 1259
  • [38] Multisensor video fusion based on spatial-temporal salience detection
    Zhang, Qiang
    Chen, Yueling
    Wang, Long
    SIGNAL PROCESSING, 2013, 93 (09) : 2485 - 2499
  • [39] Scene Cut Detection in Video by using Combination of Spatial-Temporal Video Characteristics
    Jokovic, Jugoslav
    Dordevic, Danilo
    TELSIKS 2009, VOLS 1 AND 2, 2009, : 479 - 482
  • [40] STAIR: Spatial-Temporal Reasoning with Auditable Intermediate Results for Video Question Answering
    Wang, Yueqian
    Wang, Yuxuan
    Chen, Kai
    Zhao, Dongyan
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 17, 2024, : 19215 - 19223