CTOD: Cross-Attentive Task-Alignment for One-Stage Object Detection

被引：0

作者：

Yao, Ruilin ^{[1
]}

Rong, Yi ^{[1
,2
,3
]}

Huang, Qiangqiang ^{[1
]}

Xiong, Shengwu ^{[1
,2
,3
]}

机构：

[1] Wuhan Univ Technol, Sch Comp Sci & Artificial Intelligence, Wuhan 430070, Peoples R China

[2] Wuhan Univ Technol, Sanya Sci & Educ Innovat Pk, Sanya 572000, Peoples R China

[3] Shanghai Artificial Intelligence Lab, Shanghai 200232, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2024年 / 34卷 / 11期

基金：

中国国家自然科学基金;

关键词：

One-stage object detection; task-alignment; cross-attention; spatial feature aggregation;

D O I：

10.1109/TCSVT.2024.3422879

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Existing one-stage object detectors are commonly implemented in a multi-task learning based manner, which simultaneously solves two different sub-tasks: object classification and localization. To achieve this, the detection heads with two independent branches are typically utilized to extract specific image features for each task separately. However, due to the lack of interaction between the parallel branches, the difference in learning objectives of classification and localization will lead to spatial misalignment between the predictions of these two tasks. In this work, we propose a novel Cross-attentive Task-aligned Object Detection (CTOD) method to handle this problem by explicitly promoting the prediction consistency for both tasks. Specifically, we first design a Dual Task Interaction (DTI) module, which generates task-interactive embeddings for each branch from task-specific features by using a task cross-attention mechanism. Then based on these embeddings, we propose a Spatial Feature Aggregation (SFA) module that calculates offsets and weights to aggregate information from nearby feature points at each spatial location of the task-specific feature maps. Meanwhile, we also generate adjustment parameters from the task-interactive embeddings to finally align the prediction results of the two tasks obtained from the enhanced task-specific features described above. Extensive experiments are conducted on the MS-COCO dataset. When using ResNeXt-101-64x4d-DCN as the backbone, our CTOD method achieves a detection result of 51.8 AP with single-model and single-scale testing, outperforming the recently proposed one-stage detectors ATSS, VFNet, LD and TOOD by 4.1, 1.9, 1.3 and 0.7 AP, respectively. The analysis of qualitative results also illustrates the effectiveness and superiority of CTOD in solving the task misalignment problem for object detection. Our code is available at https://github.com/Mr-Bigworth/CTOD.

引用

页码：11507 / 11520

页数：14

共 50 条

[1] Rethinking prediction alignment in one-stage object detection
Xiao, Junrui
Jiang, He
Li, Zhikai
Gu, Qingyi
Neurocomputing, 2022, 514 : 58 - 69
[2] Rethinking prediction alignment in one-stage object detection
Xiao, Junrui
Jiang, He
Li, Zhikai
Gu, Qingyi
NEUROCOMPUTING, 2022, 514 : 58 - 69
[3] TOOD: Task-aligned One-stage Object Detection
Feng, Chengjian
Zhong, Yujie
Gao, Yu
Scott, Matthew R.
Huang, Weilin
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 3490 - 3499
[4] Auxiliary Detection Head for One-Stage Object Detection
Jin, Guozheng
Taniguchi, Rin-Ichiro
Qu, Fengzhong
IEEE ACCESS, 2020, 8 (85740-85749) : 85740 - 85749
[5] Feature disentanglement in one-stage object detection
Lin, Wenjie
Chu, Jun
Leng, Lu
Miao, Jun
Wang, Lingfeng
PATTERN RECOGNITION, 2024, 145
[6] Compact One-Stage Object Detection Network
Xing, Chen
Liang, Xi
Yang, Rongjie
2020 IEEE 8TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT), 2020, : 115 - 118
[7] Uncertainty Estimation in One-Stage Object Detection
Kraus, Florian
Dietmayer, Klaus
2019 IEEE INTELLIGENT TRANSPORTATION SYSTEMS CONFERENCE (ITSC), 2019, : 53 - 60
[8] Multi-task feature-aligned head in one-stage object detection
Liu, Zeting
Shao, Mingwen
Sun, Yuantao
Peng, Zilu
SIGNAL IMAGE AND VIDEO PROCESSING, 2023, 17 (04) : 1345 - 1353
[9] Multi-task feature-aligned head in one-stage object detection
Zeting Liu
Mingwen Shao
Yuantao Sun
Zilu Peng
Signal, Image and Video Processing, 2023, 17 : 1345 - 1353
[10] Multi-task feature-aligned head in one-stage object detection
Liu, Zeting
Shao, Mingwen
Sun, Yuantao
Peng, Zilu
SIGNAL IMAGE AND VIDEO PROCESSING, 2022,

← 1 2 3 4 5 →