Joint Spatial and Temporal Feature Enhancement Network for Disturbed Object Detection

被引:0
|
作者
Zhang, Fan [1 ,2 ]
Ji, Hongbing [1 ,2 ]
Zhang, Yongquan [1 ,2 ]
Zhu, Zhigang [1 ,2 ]
机构
[1] XIDIAN UNIV, Xian Key Lab Intelligent Spectrum Sensing & Inform, Xian 710071, Peoples R China
[2] XIDIAN UNIV, Shaanxi Union Res Ctr Univ & Enterprise Intelligen, Xian 710071, Peoples R China
基金
中国国家自然科学基金;
关键词
Feature extraction; Object detection; Semantics; Aggregates; Detectors; Proposals; Correlation; Video object detection; local-global context; deformable temporal sampling; temporal attention;
D O I
10.1109/TCSVT.2024.3432900
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Video object detection remains a challenging task due to appearance degradation in certain frames. Existing studies usually aggregate temporal information from multiple frames to enhance the object's appearance representation. Although significant detection performance has been achieved, there are still two shortcomings: (1) The spatial context information within each frame is not fully exploited, which can provide additional decision support when objects are corrupted; (2) In the feature alignment phase, traditional methods tend to employ one-to-one or one-to-global temporal alignment strategies, overlooking the local temporal correlation of objects. To address the above issues, we propose a Joint Spatial and Temporal Feature Enhancement Network (JSTFE-Net) for video object detection, which can jointly utilize spatial-temporal information. First, we present a novel local-global context enhancement module to effectively encode intra-frame spatial context information. This module can enhance the learning of both local details and global semantic information of objects, thereby facilitating accurate object perception within the spatial domain. Second, we develop a deformable temporal sampling module, which adaptively samples correlated temporal information according to the motion information between frames. In addition, to improve the aggregation of temporal-correlated sampled features from multiple frames, we devise an attention-based temporal aggregation block, which dynamically fuses these feature points based on their temporal similarity with the corresponding object feature point. Note that our JSTFE-Net can be effortlessly plugged into image object detectors and state-of-the-art video object detectors. Extensive experiments on the ImageNet VID dataset show that the proposed JSTFE-Net can consistently and significantly improve performance, demonstrating its effectiveness in video object detection.
引用
收藏
页码:12258 / 12273
页数:16
相关论文
共 50 条
  • [31] VFEDet: A Variational Information Bottleneck Based Feature Enhancement Object Detection Network
    Wu, Mingyu
    Zhu, Ming
    Tang, Ruixue
    TWELFTH INTERNATIONAL CONFERENCE ON GRAPHICS AND IMAGE PROCESSING (ICGIP 2020), 2021, 11720
  • [32] FecNet: A Feature Enhancement and Cascade Network for Object Detection Using Roadside LiDAR
    Gong, Ziren
    Wang, Zhangyu
    Yu, Guizhen
    Liu, Wentao
    Yang, Songyue
    Zhou, Bin
    IEEE SENSORS JOURNAL, 2023, 23 (19) : 23780 - 23791
  • [33] Receptive field enhancement and attention feature fusion network for underwater object detection
    Xu, Huipu
    He, Zegang
    Chen, Shuo
    JOURNAL OF ELECTRONIC IMAGING, 2024, 33 (03) : 33007
  • [34] A feature temporal attention based interleaved network for fast video object detection
    Yanni Yang
    Huansheng Song
    Shijie Sun
    Yan Chen
    Xinyao Tang
    Qin Shi
    Journal of Ambient Intelligence and Humanized Computing, 2023, 14 : 497 - 509
  • [35] A feature temporal attention based interleaved network for fast video object detection
    Yang, Yanni
    Song, Huansheng
    Sun, Shijie
    Chen, Yan
    Tang, Xinyao
    Shi, Qin
    JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2021, 14 (1) : 497 - 509
  • [36] Feature Enhancement and Reconstruction for Small Object Detection
    Zhang, Chong-Jian
    Chen, Song-Lu
    Liu, Qi
    Huang, Zhi-Yong
    Chen, Feng
    Yin, Xu-Cheng
    MULTIMEDIA MODELING, MMM 2023, PT I, 2023, 13833 : 16 - 27
  • [37] Feature Enhancement and Alignment for Oriented Object Detection
    Xie, Xu
    You, Zhi-Hui
    Chen, Si-Bao
    Huang, Li-Li
    Tang, Jin
    Luo, Bin
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2024, 17 (778-787) : 778 - 787
  • [38] Object detection algorithm based on feature enhancement
    Zheng, Qiumei
    Wang, Lulu
    Wang, Fenghua
    MEASUREMENT SCIENCE AND TECHNOLOGY, 2021, 32 (08)
  • [39] DNTFE-Net: Distant Neighboring-Temporal Feature Enhancement Network for side scan sonar small object detection
    Zhao, Boyu
    Zhou, Qian
    Huang, Lijun
    Zhang, Qiang
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 258
  • [40] Selective Feature Network for Object Detection
    Cui, Yuning
    Shi, Dianxi
    Zhang, Yongjun
    Sung, Qianchong
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,