Class-Aware Dual-Supervised Aggregation Network for Video Object Detection

被引：1

作者：

Qi, Qiang ^{[1
,2
]}

Yan, Yan ^{[1
,2
]}

Wang, Hanzi ^{[1
,2
]}

机构：

[1] Xiamen Univ, Sch Informat, Fujian Key Lab Sensing & Comp Smart City, Xiamen 361005, Peoples R China

[2] Xiamen Univ, Key Lab Multimedia Trusted Percept & Efficient Com, Minist Educ China, Xiamen 361005, Peoples R China

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2024年 / 26卷

基金：

中国国家自然科学基金;

关键词：

Video object detection; distillation supervision; graph-guided feature aggregation; contrastive supervision; NAVIGATION;

D O I：

10.1109/TMM.2023.3292615

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Video object detection has attracted increasing attention in recent years. Although great success has been achieved by off-the-shelf video object detection methods through delicately designing various types of feature aggregation, they overlook the class-aware supervision and thus still suffer from the problem of classification incapability, which means the classification between objects with deteriorated or similar appearances is error-prone. In this article, we propose a novel class-aware dual-supervised aggregation network (CDANet) for video object detection, including three substantial improvements to effectively alleviate the classification incapability problem of previous methods. First, we develop a class-aware cross-modality distillation supervision that transfers the semantic knowledge of label data to the features of video data, effectively enhancing the semantic representations of features. Second, we design a graph-guided feature aggregation module that effectively models the structural relations between features by leveraging the dynamic residual graph convolutional network, enabling our CDANet to perform more effective feature aggregation in the temporal domain. Third, we present a class-aware proposal contrastive supervision to maximize the intra-class agreement and inter-class disagreement, which is conducive to improving the semantic discriminability of features. The class-aware dual supervision and feature aggregation are tightly tied into a unified end-to-end framework to make our CDANet fully exploit class-specific semantic knowledge and inter-frame temporal dependencies to enhance object appearance representations, which facilitates the classification of detected objects. We conduct experiments on the challenging ImageNet VID dataset, and the results demonstrate the superiority of our CDANet against state-of-the-art methods. More remarkably, our CDANet achieves 85.4% mAP with ResNet-101 or 86.5% mAP with ResNeXt-101.

引用

页码：2109 / 2123

页数：15

共 50 条

[1] Class-Aware Feature Aggregation Network for Video Object Detection
Han, Liang
Wang, Pichao
Yin, Zhaozheng
Wang, Fan
Li, Hao
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (12) : 8165 - 8178
[2] Class-aware Object Counting
Michel, Andreas
Gross, Wolfgang
Schenkel, Fabian
Middelmann, Wolfgang
2022 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WORKSHOPS (WACVW 2022), 2022, : 469 - 478
[3] Class-Aware Robust Adversarial Training for Object Detection
Chen, Pin-Chun
Kung, Bo-Han
Chen, Jun-Cheng
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 10415 - 10424
[4] Dual Class-Aware Contrastive Federated Semi-Supervised Learning
Guo, Qi
Wu, Di
Qi, Yong
Qi, Saiyu
IEEE TRANSACTIONS ON MOBILE COMPUTING, 2025, 24 (02) : 1073 - 1089
[5] A difference enhancement and class-aware rebalancing semi-supervised network for cropland semantic change detection
Dai, Anjin
Yang, Jianyu
Zhang, Yuxuan
Zhang, Tingting
Tang, Kaixuan
Xiao, Xiangyi
Zhang, Shuoji
INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION, 2025, 137
[6] CASN: Class-Aware Score Network for Textual Adversarial Detection
Bao, Rong
Zheng, Rui
Ding, Liang
Zhang, Qi
Tao, Dacheng
PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 671 - 687
[7] Class-Aware Contrastive Semi-Supervised Learning
Yang, Fan
Wu, Kai
Zhang, Shuyi
Jiang, Guannan
Liu, Yong
Zheng, Feng
Zhang, Wei
Wang, Chengjie
Zeng, Long
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 14401 - 14410
[8] DUALFEAT: DUAL FEATURE AGGREGATION FOR VIDEO OBJECT DETECTION
Pan, Jing
Du, Kaiwen
Yan, Yan
Wang, Hanzi
2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 2901 - 2905
[9] Layerwise Class-Aware Convolutional Neural Network
Cui, Zhen
Niu, Zhiheng
Liu, Luoqi
Yan, Shuicheng
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2017, 27 (12) : 2601 - 2612
[10] Class-aware Memory Guided Unbiased Weighting for Universal Domain Adaptive Object Detection
Lang, Qinghai
He, Zhenwei
Fu, Xiaowei
Zhang, Lei
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, : 4347 - 4356

← 1 2 3 4 5 →