CTOD: Cross-Attentive Task-Alignment for One-Stage Object Detection

被引：0

作者：

Yao, Ruilin ^{[1
]}

Rong, Yi ^{[1
,2
,3
]}

Huang, Qiangqiang ^{[1
]}

Xiong, Shengwu ^{[1
,2
,3
]}

机构：

[1] Wuhan Univ Technol, Sch Comp Sci & Artificial Intelligence, Wuhan 430070, Peoples R China

[2] Wuhan Univ Technol, Sanya Sci & Educ Innovat Pk, Sanya 572000, Peoples R China

[3] Shanghai Artificial Intelligence Lab, Shanghai 200232, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2024年 / 34卷 / 11期

基金：

中国国家自然科学基金;

关键词：

One-stage object detection; task-alignment; cross-attention; spatial feature aggregation;

D O I：

10.1109/TCSVT.2024.3422879

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Existing one-stage object detectors are commonly implemented in a multi-task learning based manner, which simultaneously solves two different sub-tasks: object classification and localization. To achieve this, the detection heads with two independent branches are typically utilized to extract specific image features for each task separately. However, due to the lack of interaction between the parallel branches, the difference in learning objectives of classification and localization will lead to spatial misalignment between the predictions of these two tasks. In this work, we propose a novel Cross-attentive Task-aligned Object Detection (CTOD) method to handle this problem by explicitly promoting the prediction consistency for both tasks. Specifically, we first design a Dual Task Interaction (DTI) module, which generates task-interactive embeddings for each branch from task-specific features by using a task cross-attention mechanism. Then based on these embeddings, we propose a Spatial Feature Aggregation (SFA) module that calculates offsets and weights to aggregate information from nearby feature points at each spatial location of the task-specific feature maps. Meanwhile, we also generate adjustment parameters from the task-interactive embeddings to finally align the prediction results of the two tasks obtained from the enhanced task-specific features described above. Extensive experiments are conducted on the MS-COCO dataset. When using ResNeXt-101-64x4d-DCN as the backbone, our CTOD method achieves a detection result of 51.8 AP with single-model and single-scale testing, outperforming the recently proposed one-stage detectors ATSS, VFNet, LD and TOOD by 4.1, 1.9, 1.3 and 0.7 AP, respectively. The analysis of qualitative results also illustrates the effectiveness and superiority of CTOD in solving the task misalignment problem for object detection. Our code is available at https://github.com/Mr-Bigworth/CTOD.

引用

页码：11507 / 11520

页数：14

共 50 条

[41] EYOLOX: An Efficient One-Stage Object Detection Network Based on YOLOX
Tang, Rui
Sun, Hui
Liu, Di
Xu, Hui
Qi, Miao
Kong, Jun
APPLIED SCIENCES-BASEL, 2023, 13 (03):
[42] Multi-category solar radio burst detection based on task-aligned one-stage object detection model
Wang, Mingming
Yuan, Guowu
He, Hailan
Tan, Chengming
Wu, Hao
Zhou, Hao
ASTROPHYSICS AND SPACE SCIENCE, 2025, 370 (03)
[43] One-Stage Lightweight Network of Object Detection for Rectangular Panoramic Images
Lu, Yingying
Tie, Yun
Qi, Lin
ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT VII, ICIC 2024, 2024, 14868 : 390 - 401
[44] DualHead for One-stage Object Detection Networks with Receptive Field Enhancement
Wang, Shaohua
Dai, Yaping
Shao, Shuai
2022 41ST CHINESE CONTROL CONFERENCE (CCC), 2022, : 6666 - 6671
[45] Vehicle Detection in Overhead Satellite Images Using a One-Stage Object Detection Model
Stuparu, Delia-Georgiana
Ciobanu, Radu-Ioan
Dobre, Ciprian
SENSORS, 2020, 20 (22) : 1 - 18
[46] One-Stage Object Referring with Gaze Estimation
Chen, Jianhang
Zhang, Xu
Wu, Yue
Ghosh, Shalini
Natarajan, Pradeep
Chang, Shih-Fu
Allebach, Jan
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 5017 - 5026
[47] ODEE: A One-Stage Object Detection Framework for Overlapping and Nested Event Extraction
Ning, Jinzhong
Yang, Zhihao
Wang, Zhizheng
Sun, Yuanyuan
Lin, Hongfei
PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 5170 - 5178
[48] Generative and self-supervised domain adaptation for one-stage object detection
Fujii, Kazuma
Kawamoto, Kazuhiko
ARRAY, 2021, 11
[49] Refined One-Stage Oriented Object Detection Method for Remote Sensing Images
Hou, Liping
Lu, Ke
Xue, Jian
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 1545 - 1558
[50] One-stage object detection networks for inspecting the surface defects of magnetic tiles
Wei, Jiaqi
Zhu, Peiyuan
Qian, Xiang
Zhu, Shidong
2019 IEEE INTERNATIONAL CONFERENCE ON IMAGING SYSTEMS & TECHNIQUES (IST 2019), 2019,

← 1 2 3 4 5 →