Combining transformer global and local feature extraction for object detection

被引:6
|
作者
Li, Tianping [1 ]
Zhang, Zhenyi [1 ]
Zhu, Mengdi [1 ]
Cui, Zhaotong [1 ]
Wei, Dongmei [1 ]
机构
[1] Shandong Normal Univ, Sch Phys & Elect, Jinan, Shandong, Peoples R China
基金
中国国家自然科学基金;
关键词
Object detection; Attention mechanism; Transformer; Anchor-free; Detector head;
D O I
10.1007/s40747-024-01409-z
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Convolutional neural network (CNN)-based object detectors perform excellently but lack global feature extraction and cannot establish global dependencies between object pixels. Although the Transformer is able to compensate for this, it does not incorporate the advantages of convolution, which results in insufficient information being obtained about the details of local features, as well as slow speed and large computational parameters. In addition, Feature Pyramid Network (FPN) lacks information interaction across layers, which can reduce the acquisition of feature context information. To solve the above problems, this paper proposes a CNN-based anchor-free object detector that combines transformer global and local feature extraction (GLFT) to enhance the extraction of semantic information from images. First, the segmented channel extraction feature attention (SCEFA) module was designed to improve the extraction of local multiscale channel features from the model and enhance the discrimination of pixels in the object region. Second, the aggregated feature hybrid transformer (AFHTrans) module combined with convolution is designed to enhance the extraction of global and local feature information from the model and to establish the dependency of the pixels of distant objects. This approach compensates for the shortcomings of the FPN by means of multilayer information aggregation transmission. Compared with a transformer, these methods have obvious advantages. Finally, the feature extraction head (FE-Head) was designed to extract full-text information based on the features of different tasks. An accuracy of 47.0% and 82.76% was achieved on the COCO2017 and PASCAL VOC2007 + 2012 datasets, respectively, and the experimental results validate the effectiveness of our method.
引用
收藏
页码:4897 / 4920
页数:24
相关论文
共 50 条
  • [1] Local to Global Feature Learning for Salient Object Detection
    Feng, Xuelu
    Zhou, Sanping
    Zhu, Zixin
    Wang, Le
    Hua, Gang
    PATTERN RECOGNITION LETTERS, 2022, 162 : 81 - 88
  • [2] Combining Object-Based Local and Global Feature Statistics for Salient Object Search
    Naqvi, Syed S.
    Browne, Will N.
    Hollitt, Christopher
    PROCEEDINGS OF 2013 28TH INTERNATIONAL CONFERENCE ON IMAGE AND VISION COMPUTING NEW ZEALAND (IVCNZ 2013), 2013, : 394 - 399
  • [3] Low-Light Object Detection Combining Transformer and Dynamic Feature Fusion
    Cai, Teng
    Chen, Cifa
    Dong, Fangmin
    Computer Engineering and Applications, 2024, 60 (09) : 135 - 141
  • [4] Combining local and global information for product feature extraction in opinion documents
    Yang, Liang
    Liu, Bing
    Lin, Hongfei
    Lin, Yuan
    INFORMATION PROCESSING LETTERS, 2016, 116 (10) : 623 - 627
  • [5] Transformer Based Remote Sensing Object Detection With Enhanced Multispectral Feature Extraction
    Zhu, Jiahe
    Chen, Xu
    Zhang, Huan
    Tan, Zelong
    Wang, Shengjin
    Ma, Hongbing
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2023, 20
  • [6] Fall detection algorithm based on global and local feature extraction
    Li, Bin
    Li, Jiangjiao
    Wang, Peng
    PATTERN RECOGNITION LETTERS, 2024, 185 : 31 - 37
  • [7] HA-Transformer: Harmonious aggregation from local to global for object detection
    Chen, Yang
    Chen, Sihan
    Deng, Yongqiang
    Wang, Kunfeng
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 230
  • [8] A Hybrid CNN-Transformer Network for Object Detection in Optical Remote Sensing Images: Integrating Local and Global Feature Fusion
    Huang, Youxiang
    Jiao, Donglai
    Huang, Xingru
    Tang, Tiantian
    Gui, Guan
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2025, 18 : 241 - 254
  • [9] Cross-domain object detection by local to global object-aware feature alignment
    Yiguo Song
    Zhenyu Liu
    Ruining Tang
    Guifang Duan
    Jianrong Tan
    Neural Computing and Applications, 2024, 36 : 3631 - 3644
  • [10] Cross-domain object detection by local to global object-aware feature alignment
    Song, Yiguo
    Liu, Zhenyu
    Tang, Ruining
    Duan, Guifang
    Tan, Jianrong
    NEURAL COMPUTING & APPLICATIONS, 2024, 36 (07): : 3631 - 3644