Joint Spatial and Temporal Feature Enhancement Network for Disturbed Object Detection

被引：0

作者：

Zhang, Fan ^{[1
,2
]}

Ji, Hongbing ^{[1
,2
]}

Zhang, Yongquan ^{[1
,2
]}

Zhu, Zhigang ^{[1
,2
]}

机构：

[1] XIDIAN UNIV, Xian Key Lab Intelligent Spectrum Sensing & Inform, Xian 710071, Peoples R China

[2] XIDIAN UNIV, Shaanxi Union Res Ctr Univ & Enterprise Intelligen, Xian 710071, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2024年 / 34卷 / 12期

基金：

中国国家自然科学基金;

关键词：

Feature extraction; Object detection; Semantics; Aggregates; Detectors; Proposals; Correlation; Video object detection; local-global context; deformable temporal sampling; temporal attention;

D O I：

10.1109/TCSVT.2024.3432900

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Video object detection remains a challenging task due to appearance degradation in certain frames. Existing studies usually aggregate temporal information from multiple frames to enhance the object's appearance representation. Although significant detection performance has been achieved, there are still two shortcomings: (1) The spatial context information within each frame is not fully exploited, which can provide additional decision support when objects are corrupted; (2) In the feature alignment phase, traditional methods tend to employ one-to-one or one-to-global temporal alignment strategies, overlooking the local temporal correlation of objects. To address the above issues, we propose a Joint Spatial and Temporal Feature Enhancement Network (JSTFE-Net) for video object detection, which can jointly utilize spatial-temporal information. First, we present a novel local-global context enhancement module to effectively encode intra-frame spatial context information. This module can enhance the learning of both local details and global semantic information of objects, thereby facilitating accurate object perception within the spatial domain. Second, we develop a deformable temporal sampling module, which adaptively samples correlated temporal information according to the motion information between frames. In addition, to improve the aggregation of temporal-correlated sampled features from multiple frames, we devise an attention-based temporal aggregation block, which dynamically fuses these feature points based on their temporal similarity with the corresponding object feature point. Note that our JSTFE-Net can be effortlessly plugged into image object detectors and state-of-the-art video object detectors. Extensive experiments on the ImageNet VID dataset show that the proposed JSTFE-Net can consistently and significantly improve performance, demonstrating its effectiveness in video object detection.

引用

页码：12258 / 12273

页数：16

共 50 条

[1] SSFENET: SPATIAL AND SEMANTIC FEATURE ENHANCEMENT NETWORK FOR OBJECT DETECTION
Wang, Tianyuan
Ma, Can
Su, Haoshan
Wang, Weiping
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 1500 - 1504
[2] SPATIAL-TEMPORAL FEATURE AGGREGATION NETWORK FOR VIDEO OBJECT DETECTION
Chen, Zhu
Li, Weihai
Fei, Chi
Liu, Bin
Yu, Nenghai
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 1858 - 1862
[3] Deep Spatial-Temporal Joint Feature Representation for Video Object Detection
Zhao, Baojun
Zhao, Boya
Tang, Linbo
Han, Yuqi
Wang, Wenzheng
SENSORS, 2018, 18 (03)
[4] Temporal Feature Enhancement Network with External Memory for Object Detection in Surveillance Video
Fujitake, Masato
Sugimoto, Akihiro
2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 7684 - 7691
[5] Refined feature enhancement network for object detection
Li, Zonghui
Dong, Yongsheng
COMPLEX & INTELLIGENT SYSTEMS, 2025, 11 (01)
[6] Patchwise Temporal-Spatial Feature Aggregation Network for Object Detection in Satellite Video
Zheng, Shangdong
Wu, Zebin
Xu, Yang
Liu, Pengfei
Zheng, Peng
Wei, Zhihui
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21
[7] Construction of a feature enhancement network for small object detection
Zhang, Hongyun
Li, Miao
Miao, Duoqian
Pedrycz, Witold
Wang, Zhaoguo
Jiang, Minghui
PATTERN RECOGNITION, 2023, 143
[8] Lateral Feature Enhancement Network for Page Object Detection
Shi, Cao
Xu, Canhui
Bi, Hengyue
Cheng, Yuanzhi
Li, Yuteng
Zhang, Honghong
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2022, 71
[9] Feature enhancement modules applied to a feature pyramid network for object detection
Liu, Min
Lin, Kun
Huo, Wujie
Hu, Lanlan
He, Zhizi
PATTERN ANALYSIS AND APPLICATIONS, 2023, 26 (02) : 617 - 629
[10] Temporal feature enhancement network with external memory for live-stream video object detection
Fujitake, Masato
Sugimoto, Akihiro
PATTERN RECOGNITION, 2022, 131

← 1 2 3 4 5 →