Relation-Guided Multi-stage Feature Aggregation Network for Video Object Detection

被引:0
|
作者
Yao, Tingting [1 ]
Cao, Fuxiao [1 ]
Mi, Fuheng [1 ]
Li, Danmeng [1 ]
机构
[1] Dalian Maritime Univ, Coll Informat Sci & Technol, Dalian 116026, Peoples R China
基金
中国国家自然科学基金;
关键词
Video object detection; Temporal context information; Feature aggregation; Temporal relation-guided;
D O I
10.1007/978-981-99-8537-1_12
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video object detection task has received extensive research attention and various methods have been proposed. The quality of single frame in the original video is usually deteriorated by motion blur and object occlusion, which leads to the failure of detection. Although some methods have attempted to enhance the feature representation of each frame by aggregating temporal context information from other frames, the existing methods are usually sensitive to the change of object appearance and scale, which lead to false or missing detection. Therefore, in this paper, we propose a Relation-guided Multi-stage Feature Aggregation (RMFA) network for video object detection. First, a Multi-Stage Feature Aggregation (MSFA) framework is devised to aggregate the feature representation of global and local support frames in each stage. In this way, both global semantic information and local motion information could be better captured. Furthermore, a Multi-sources Feature Aggregation (MFA) module is proposed to enhance the quality of support frames, hence the feature representation of current frame could be improved. Finally, a Temporal Relation-Guided (TRG) module is proposed to improve the feature aggregation perception by supervising the semantic similarity relationships between different object proposals. Therefore, the model adaptability to selectively store valuable features could be enhanced. Qualitative and quantitative experimental results on the ImageNet VID dataset demonstrate that our model could achieve superior video object detection results against a number of the state-of-the-art ones. Especially, when object is occluded or under fast motion, our model shows outstanding performances.
引用
收藏
页码:146 / 157
页数:12
相关论文
共 50 条
  • [41] MIDFA: Memory-Based Instance Division and Feature Aggregation Network for Video Object Detection
    Chen, Qiaochuan
    Zhou, Min
    Yu, Hang
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2023, PT III, 2023, 13937 : 153 - 164
  • [42] Pedestrian Detection with Unsupervised Multi-Stage Feature Learning
    Sermanet, Pierre
    Kavukcuoglu, Koray
    Chintala, Soumith
    LeCun, Yann
    2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, : 3626 - 3633
  • [43] Multi-scale feature aggregation and boundary awareness network for salient object detection
    Wu, Qin
    Wang, Jianzhe
    Chai, Zhilei
    Guo, Guodong
    IMAGE AND VISION COMPUTING, 2022, 122
  • [44] Enriched Feature Guided Refinement Network for Object Detection
    Nie, Jing
    Anwer, Rao Muhammad
    Cholakkal, Hisham
    Khan, Fahad Shahbaz
    Pang, Yanwei
    Shao, Ling
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 9536 - 9545
  • [45] Person Re-identification with Multi-stage Channel Feature Aggregation
    Guo, Hubo
    Li, Xin
    Wang, Qiang
    Zhang, Meiling
    Huang, Zhihong
    INTELLIGENT ROBOTICS AND APPLICATIONS, ICIRA 2024, PT II, 2025, 15202 : 117 - 130
  • [46] Temporal-adaptive sparse feature aggregation for video object detection
    He, Fei
    Li, Qiaozhe
    Zhao, Xin
    Huang, Kaiqi
    PATTERN RECOGNITION, 2022, 127
  • [47] A Feature Pyramid Based Multi-stage Framework for Object Detection in Low-altitude UAV Images
    Mittal, Payal
    Sharma, Akashdeep
    Singh, Raman
    INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2022, 31 (02)
  • [48] Multilevel Spatial-Temporal Feature Aggregation for Video Object Detection
    Xu, Chao
    Zhang, Jiangning
    Wang, Mengmeng
    Tian, Guanzhong
    Liu, Yong
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (11) : 7809 - 7820
  • [49] Video object detection algorithm based on multi-level feature aggregation under mixed sampler
    Qin S.
    Gai S.
    Da F.
    Zhejiang Daxue Xuebao (Gongxue Ban)/Journal of Zhejiang University (Engineering Science), 2024, 58 (01): : 10 - 19
  • [50] Multilevel diverse feature aggregation network for salient object detection
    Yang, Qiaoning
    Zheng, Jiahao
    Chen, Juan
    NEUROCOMPUTING, 2025, 628