Towards Efficient Use of Multi-Scale Features in Transformer-Based Object Detectors

被引:14
|
作者
Zhang, Gongjie [1 ,2 ]
Luo, Zhipeng [1 ,3 ]
Tian, Zichen [1 ]
Zhang, Jingyi [1 ]
Zhang, Xiaoqin [4 ]
Lu, Shijian [1 ]
机构
[1] Nanyang Technol Univ, S Lab, Singapore, Singapore
[2] Black Sesame Technol, Singapore, Singapore
[3] SenseTime Res, Hong Kong, Peoples R China
[4] Wenzhou Univ, Wenzhou, Peoples R China
关键词
D O I
10.1109/CVPR52729.2023.00601
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multi-scale features have been proven highly effective for object detection but often come with huge and even prohibitive extra computation costs, especially for the recent Transformer-based detectors. In this paper, we propose Iterative Multi-scale Feature Aggregation (IMFA) - a generic paradigm that enables efficient use of multi-scale features in Transformer-based object detectors. The core idea is to exploit sparse multi-scale features from just a few crucial locations, and it is achieved with two novel designs. First, IMFA rearranges the Transformer encoder-decoder pipeline so that the encoded features can be iteratively updated based on the detection predictions. Second, IMFA sparsely samples scale-adaptive features for refined detection from just a few keypoint locations under the guidance of prior detection predictions. As a result, the sampled multi-scale features are sparse yet still highly beneficial for object detection. Extensive experiments show that the proposed IMFA boosts the performance of multiple Transformer-based object detectors significantly yet with only slight computational overhead.
引用
收藏
页码:6206 / 6216
页数:11
相关论文
共 50 条
  • [41] Data-efficient multi-scale fusion vision transformer
    Tang, Hao
    Liu, Dawei
    Shen, Chengchao
    PATTERN RECOGNITION, 2025, 161
  • [42] Multi-Scale Feature Attention-DEtection TRansformer: Multi-Scale Feature Attention for security check object detection
    Sima, Haifeng
    Chen, Bailiang
    Tang, Chaosheng
    Zhang, Yudong
    Sun, Junding
    IET COMPUTER VISION, 2024, 18 (05) : 613 - 625
  • [43] Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation
    Li, Feng
    Zhang, Hao
    Xu, Huaizhe
    Liu, Shilong
    Zhang, Lei
    Ni, Lionel M.
    Shum, Heimg-Yeung
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 3041 - 3050
  • [44] DMFC-UFormer: Depthwise multi-scale factorized convolution transformer-based UNet for medical image segmentation
    Garbaz, Anass
    Oukdach, Yassine
    Charfi, Said
    El Ansari, Mohamed
    Koutti, Lahcen
    Salihoun, Mouna
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2025, 101
  • [45] MO-Transformer: A Transformer-Based Multi-Object Point Cloud Reconstruction Network
    Lyu, Erli
    Zhang, Zhengyan
    Liu, Wei
    Wang, Jiaole
    Song, Shuang
    Meng, Max Q. -H.
    2022 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2022, : 1024 - 1030
  • [46] Seismic Data Interpolation Based on Multi-Scale Transformer
    Guo, Yuanqi
    Fu, Lihua
    Li, Hongwei
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2023, 20
  • [47] An efficient algorithm for multi-scale maritime object detection and recognition
    Liu, Yang
    Yi, Ran
    Ma, Ding
    Wang, Yongfu
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2024, 46 (03) : 7259 - 7271
  • [48] Efficient Multi-scale POMDPs for Robotic Object Search and Delivery
    Holzherr, Luc
    Foerster, Julian
    Breyer, Michel
    Nieto, Juan
    Siegwart, Roland
    Chung, Jen Jen
    2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, : 6585 - 6591
  • [49] Scale-aware token-matching for transformer-based object detector
    Jung, Aecheon
    Hong, Sungeun
    Hyun, Yoonsuk
    PATTERN RECOGNITION LETTERS, 2024, 185 : 197 - 202
  • [50] Temperature Tomography for Combustion Field Based on Hierarchical Vision Transformer and Multi-scale Features Merging
    Si J.
    Wang X.
    Cheng Y.
    Liu C.
    Dianzi Yu Xinxi Xuebao/Journal of Electronics and Information Technology, 2023, 45 (10): : 3511 - 3519