Exploring Spatial-Temporal Instance Relationships in an Intermediate Domain for Image-to-Video Object Detection

被引：0

作者：

Wen, Zihan ^{[1
]}

Chen, Jin ^{[1
]}

Wu, Xinxiao ^{[1
]}

机构：

[1] Beijing Inst Technol, Sch Comp Sci, Beijing Lab Intelligent Informat Technol, Beijing, Peoples R China

来源：

COMPUTER VISION - ACCV 2022 WORKSHOPS | 2023年 / 13848卷

关键词：

Deep learning; Object detection; Domain adaptation;

D O I：

10.1007/978-3-031-27066-6_25

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Image-to-video object detection leverages annotated images to help detect objects in unannotated videos, so as to break the heavy dependency on the expensive annotation of large-scale video frames. This task is extremely challenging due to the serious domain discrepancy between images and video frames caused by appearance variance and motion blur. Previous methods perform both image-level and instance-level alignments to reduce the domain discrepancy, but the existing false instance alignments may limit their performance in real scenarios. We propose a novel spatial-temporal graph to model the contextual relationships between instances to alleviate the false alignments. Through message propagation over the graph, the visual information from the spatial and temporal neighboring object proposals are adaptively aggregated to enhance the current instance representation. Moreover, to adapt the source-biased decision boundary to the target data, we generate an intermediate domain between images and frames. It is worth mentioning that our method can be easily applied as a plug-and-play component to other image-to-video object detection models based on the instance alignment. Experiments on several datasets demonstrate the effectiveness of our method. Code will be available at: https://github.com/wenzihan/STMP.

引用

页码：360 / 375

页数：16

共 50 条

[1] Spatial-temporal Causal Inference for Partial Image-to-video Adaptation
Chen, Jin
Wu, Xinxiao
Hu, Yao
Luo, Jiebo
THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 1027 - 1035
[2] Video Object Detection with an Aligned Spatial-Temporal Memory
Xiao, Fanyi
Lee, Yong Jae
COMPUTER VISION - ECCV 2018, PT VIII, 2018, 11212 : 494 - 510
[3] SPATIAL-TEMPORAL FEATURE AGGREGATION NETWORK FOR VIDEO OBJECT DETECTION
Chen, Zhu
Li, Weihai
Fei, Chi
Liu, Bin
Yu, Nenghai
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 1858 - 1862
[4] Multilevel Spatial-Temporal Feature Aggregation for Video Object Detection
Xu, Chao
Zhang, Jiangning
Wang, Mengmeng
Tian, Guanzhong
Liu, Yong
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (11) : 7809 - 7820
[5] Deep Spatial-Temporal Joint Feature Representation for Video Object Detection
Zhao, Baojun
Zhao, Boya
Tang, Linbo
Han, Yuqi
Wang, Wenzheng
SENSORS, 2018, 18 (03)
[6] Object Detection-Based Video Retargeting With Spatial-Temporal Consistency
Lee, Seung Joon
Lee, Siyeong
Cho, Sung In
Kang, Suk-Ju
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2020, 30 (12) : 4434 - 4439
[7] Learning Complementary Spatial-Temporal Transformer for Video Salient Object Detection
Liu, Nian
Nan, Kepan
Zhao, Wangbo
Yao, Xiwen
Han, Junwei
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (08) : 10663 - 10673
[8] End-to-End Video Object Detection with Spatial-Temporal Transformers
He, Lu
Zhou, Qianyu
Li, Xiangtai
Niu, Li
Cheng, Guangliang
Li, Xiao
Liu, Wenxuan
Tong, Yunhai
Ma, Lizhuang
Zhang, Liqing
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 1507 - 1516
[9] Exploring spatial-temporal features fusion model for Deepfake video detection
Wu, Jiujiu
Zhou, Jiyu
Wang, Danyu
Wang, Lin
JOURNAL OF ELECTRONIC IMAGING, 2023, 32 (06)
[10] SALIENT OBJECT DETECTION IN IMAGE SEQUENCES VIA SPATIAL-TEMPORAL CUE
Gan, Chuang
Qin, Zengchang
Xu, Jia
Wan, Tao
2013 IEEE INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (IEEE VCIP 2013), 2013,

← 1 2 3 4 5 →