Multi-scale coupled attention for visual object detection

被引:2
|
作者
Li, Fei [1 ]
Yan, Hongping [2 ]
Shi, Linsu [1 ]
机构
[1] China Tower Corp Ltd, 9 Dongran North St, Beijing 100195, Peoples R China
[2] China Univ Geosci, Xueyuan Rd 29, Beijing 100083, Peoples R China
来源
SCIENTIFIC REPORTS | 2024年 / 14卷 / 01期
关键词
Attention mechanism; Deep neural networks; Object detection; Self-attention learning; Transformer; YOLO;
D O I
10.1038/s41598-024-60897-8
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The application of deep neural network has achieved remarkable success in object detection. However, the network structures should be still evolved consistently and tuned finely to acquire better performance. This gears to the continuous demands on high performance in those complex scenes, where multi-scale objects to be detected are located here and there. To this end, this paper proposes a network structure called Multi-Scale Coupled Attention (MSCA) under the framework of self-attention learning with methodologies of importance assessment. Architecturally, it consists of a Multi-Scale Coupled Channel Attention (MSCCA) module, and a Multi-Scale Coupled Spatial Attention (MSCSA) module. Specifically, the MSCCA module is developed to achieve the goal of self-attention learning linearly on the multi-scale channels. In parallel, the MSCSA module is constructed to achieve this goal nonlinearly on the multi-scale spatial grids. The MSCCA and MSSCA modules can be connected together into a sequence, which can be used as a plugin to develop end-to-end learning models for object detection. Finally, our proposed network is compared on two public datasets with 13 classical or state-of-the-art models, including the Faster R-CNN, Cascade R-CNN, RetinaNet, SSD, PP-YOLO, YOLO v3, YOLO v5, YOLO v7, YOLOX, DETR, conditional DETR, UP-DETR and FP-DETR. Comparative experimental results with numerical scores, the ablation study, and the performance behaviour all demonstrate the effectiveness of our proposed model.
引用
收藏
页数:19
相关论文
共 50 条
  • [41] Visual Attention Guided Multi-Scale Boundary Detection in Natural Images for Contour Grouping
    Zhong, Jingjing
    Luo, Siwei
    Zou, Qi
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2009, E92D (03): : 555 - 558
  • [42] Language conditioned multi-scale visual attention networks for visual grounding
    Yao, Haibo
    Wang, Lipeng
    Cai, Chengtao
    Wang, Wei
    Zhang, Zhi
    Shang, Xiaobing
    IMAGE AND VISION COMPUTING, 2024, 150
  • [43] Multi-scale single object tracking based on the attention mechanism
    Song, Jianfeng
    Miao, Qiguang
    Wang, Chongxiao
    Xu, Hao
    Yang, Jin
    Xi'an Dianzi Keji Daxue Xuebao/Journal of Xidian University, 2021, 48 (05): : 110 - 116
  • [44] ADMNet: Attention-Guided Densely Multi-Scale Network for Lightweight Salient Object Detection
    Zhou, Xiaofei
    Shen, Kunye
    Liu, Zhi
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 10828 - 10841
  • [45] Multi-scale attention and boundary enhancement with long-range dependency for salient object detection
    Yu, Ming
    Lin, Xiaoqing
    Liu, Yi
    Guo, Yingchun
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2023, 44 (06) : 8957 - 8969
  • [46] Small object detection in unmanned aerial vehicle images using multi-scale hybrid attention
    Song, Gang
    Du, Hongwei
    Zhang, Xinyue
    Bao, Fangxun
    Zhang, Yunfeng
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 128
  • [47] DMA-Net: Decoupled Multi-Scale Attention for Few-Shot Object Detection
    Xie, Xijun
    Lee, Feifei
    Chen, Qiu
    APPLIED SCIENCES-BASEL, 2023, 13 (12):
  • [48] DMA-YOLO: multi-scale object detection method with attention mechanism for aerial images
    Li, Ya-ling
    Feng, Yong
    Zhou, Ming-liang
    Xiong, Xian-cai
    Wang, Yong-heng
    Qiang, Bao-hua
    VISUAL COMPUTER, 2024, 40 (06): : 4505 - 4518
  • [49] Exploring multi-scale deformable context and channel-wise attention for salient object detection
    Liu, Yi
    Duanmu, Mingxing
    Huo, Zhen
    Qi, Hang
    Chen, Zuntian
    Li, Lei
    Zhang, Qiang
    NEUROCOMPUTING, 2021, 428 (428) : 92 - 103
  • [50] Composite Backbone Small Object Detection Based on Context and Multi-Scale Information with Attention Mechanism
    Jing, Xinhan
    Liu, Xuesong
    Liu, Baolin
    MATHEMATICS, 2024, 12 (05)