CTOD: Cross-Attentive Task-Alignment for One-Stage Object Detection

被引:0
|
作者
Yao, Ruilin [1 ]
Rong, Yi [1 ,2 ,3 ]
Huang, Qiangqiang [1 ]
Xiong, Shengwu [1 ,2 ,3 ]
机构
[1] Wuhan Univ Technol, Sch Comp Sci & Artificial Intelligence, Wuhan 430070, Peoples R China
[2] Wuhan Univ Technol, Sanya Sci & Educ Innovat Pk, Sanya 572000, Peoples R China
[3] Shanghai Artificial Intelligence Lab, Shanghai 200232, Peoples R China
基金
中国国家自然科学基金;
关键词
One-stage object detection; task-alignment; cross-attention; spatial feature aggregation;
D O I
10.1109/TCSVT.2024.3422879
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Existing one-stage object detectors are commonly implemented in a multi-task learning based manner, which simultaneously solves two different sub-tasks: object classification and localization. To achieve this, the detection heads with two independent branches are typically utilized to extract specific image features for each task separately. However, due to the lack of interaction between the parallel branches, the difference in learning objectives of classification and localization will lead to spatial misalignment between the predictions of these two tasks. In this work, we propose a novel Cross-attentive Task-aligned Object Detection (CTOD) method to handle this problem by explicitly promoting the prediction consistency for both tasks. Specifically, we first design a Dual Task Interaction (DTI) module, which generates task-interactive embeddings for each branch from task-specific features by using a task cross-attention mechanism. Then based on these embeddings, we propose a Spatial Feature Aggregation (SFA) module that calculates offsets and weights to aggregate information from nearby feature points at each spatial location of the task-specific feature maps. Meanwhile, we also generate adjustment parameters from the task-interactive embeddings to finally align the prediction results of the two tasks obtained from the enhanced task-specific features described above. Extensive experiments are conducted on the MS-COCO dataset. When using ResNeXt-101-64x4d-DCN as the backbone, our CTOD method achieves a detection result of 51.8 AP with single-model and single-scale testing, outperforming the recently proposed one-stage detectors ATSS, VFNet, LD and TOOD by 4.1, 1.9, 1.3 and 0.7 AP, respectively. The analysis of qualitative results also illustrates the effectiveness and superiority of CTOD in solving the task misalignment problem for object detection. Our code is available at https://github.com/Mr-Bigworth/CTOD.
引用
收藏
页码:11507 / 11520
页数:14
相关论文
共 50 条
  • [31] One-Stage Arbitrary-Oriented Object Detection With Prediction Decoupling
    Xiong, Zhaolong
    IEEE ACCESS, 2022, 10 : 86057 - 86063
  • [32] Robust one-stage object detection with location-aware classifiers
    Chen, Qiang
    Wang, Peisong
    Cheng, Anda
    Wang, Wanguo
    Zhang, Yifan
    Cheng, Jian
    PATTERN RECOGNITION, 2020, 105
  • [33] PDNet: Toward Better One-Stage Object Detection With Prediction Decoupling
    Yang, Li
    Xu, Yan
    Wang, Shaoru
    Yuan, Chunfeng
    Zhang, Ziqi
    Li, Bing
    Hu, Weiming
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 5121 - 5133
  • [34] Efficient One-Stage Video Object Detection by Exploiting Temporal Consistency
    Sun, Guanxiong
    Hua, Yang
    Hu, Guosheng
    Robertson, Neil
    COMPUTER VISION - ECCV 2022, PT XXXV, 2022, 13695 : 1 - 16
  • [35] One-stage object detection knowledge distillation via adversarial learning
    Na Dong
    Yongqiang Zhang
    Mingli Ding
    Shibiao Xu
    Yancheng Bai
    Applied Intelligence, 2022, 52 : 4582 - 4598
  • [36] Balanced One-Stage Object Detection by Enhancing the Effect of Positive Samples
    Wang, Zuyi
    Zhu, Wenjun
    Zhao, Wei
    Xu, Li
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (08) : 4011 - 4026
  • [37] Towards Accurate One-Stage Object Detection with AP-Loss
    Chen, Kean
    Li, Jianguo
    Lin, Weiyao
    See, John
    Wang, Ji
    Duan, Lingyu
    Chen, Zhibo
    He, Changwei
    Zou, Junni
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 5114 - 5122
  • [38] One-stage object detection knowledge distillation via adversarial learning
    Dong, Na
    Zhang, Yongqiang
    Ding, Mingli
    Xu, Shibiao
    Bai, Yancheng
    APPLIED INTELLIGENCE, 2022, 52 (04) : 4582 - 4598
  • [39] AugFCOS: Augmented fully convolutional one-stage object detection network
    Zhang, Xiuwei
    Guo, Wei
    Xing, Yinghui
    Wang, Wenna
    Yin, Hanlin
    Zhang, Yanning
    PATTERN RECOGNITION, 2023, 134
  • [40] A One-stage Temporal Detector with Attentional LSTM for Video Object Detection
    Yu, Jiahui
    Ju, Zhaojie
    Gao, Hongwei
    Zhou, Dalin
    2021 27TH INTERNATIONAL CONFERENCE ON MECHATRONICS AND MACHINE VISION IN PRACTICE (M2VIP), 2021,