Towards Efficient Use of Multi-Scale Features in Transformer-Based Object Detectors

被引:14
|
作者
Zhang, Gongjie [1 ,2 ]
Luo, Zhipeng [1 ,3 ]
Tian, Zichen [1 ]
Zhang, Jingyi [1 ]
Zhang, Xiaoqin [4 ]
Lu, Shijian [1 ]
机构
[1] Nanyang Technol Univ, S Lab, Singapore, Singapore
[2] Black Sesame Technol, Singapore, Singapore
[3] SenseTime Res, Hong Kong, Peoples R China
[4] Wenzhou Univ, Wenzhou, Peoples R China
关键词
D O I
10.1109/CVPR52729.2023.00601
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multi-scale features have been proven highly effective for object detection but often come with huge and even prohibitive extra computation costs, especially for the recent Transformer-based detectors. In this paper, we propose Iterative Multi-scale Feature Aggregation (IMFA) - a generic paradigm that enables efficient use of multi-scale features in Transformer-based object detectors. The core idea is to exploit sparse multi-scale features from just a few crucial locations, and it is achieved with two novel designs. First, IMFA rearranges the Transformer encoder-decoder pipeline so that the encoded features can be iteratively updated based on the detection predictions. Second, IMFA sparsely samples scale-adaptive features for refined detection from just a few keypoint locations under the guidance of prior detection predictions. As a result, the sampled multi-scale features are sparse yet still highly beneficial for object detection. Extensive experiments show that the proposed IMFA boosts the performance of multiple Transformer-based object detectors significantly yet with only slight computational overhead.
引用
收藏
页码:6206 / 6216
页数:11
相关论文
共 50 条
  • [31] ACT-FRCNN: Progress Towards Transformer-Based Object Detection
    Zulfqar, Sukana
    Elgamal, Zenab
    Zia, Muhammad Azam
    Razzaq, Abdul
    Ullah, Sami
    Dawood, Hussain
    ALGORITHMS, 2024, 17 (11)
  • [32] An Efficient Image Retrieval System Based on Multi-Scale Shape Features
    Arjun, P.
    Mirnalinee, T. T.
    JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2018, 27 (11)
  • [33] CervixFormer: A Multi-scale swin transformer-Based cervical pap-Smear WSI classification framework
    Khan, Anwar
    Han, Seunghyeon
    Ilyas, Naveed
    Lee, Yong-Moon
    Lee, Boreom
    COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2023, 240
  • [34] Effective Face Recognition using Adaptive Multi-scale Transformer-based Resnet with Optimal Pattern Extraction
    Shivaprakash, Santhosh
    Rajashekararadhya, Sannangi Viswaradhya
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (07) : 812 - 827
  • [35] The MS-RadarFormer: A Transformer-Based Multi-Scale Deep Learning Model for Radar Echo Extrapolation
    Geng, Huantong
    Wu, Fangli
    Zhuang, Xiaoran
    Geng, Liangchao
    Xie, Boyang
    Shi, Zhanpeng
    REMOTE SENSING, 2024, 16 (02)
  • [36] Human pose estimation in complex background videos via Transformer-based multi-scale feature integration
    Cheng, Chen
    Xu, Huahu
    DISPLAYS, 2024, 84
  • [37] More Efficient Encoder: Boosting Transformer-Based Multi-object Tracking Performance Through YOLOX
    Zheng, Le
    Mao, Yaobin
    Zheng, Mengjin
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT XII, 2025, 15042 : 376 - 389
  • [38] Rethinking the multi-scale feature hierarchy in object detection transformer (DETR)
    Liu, Fanglin
    Zheng, Qinghe
    Tian, Xinyu
    Shu, Feng
    Jiang, Weiwei
    Wang, Miaohui
    Elhanashi, Abdussalam
    Saponara, Sergio
    APPLIED SOFT COMPUTING, 2025, 175
  • [39] Transformer-Based Multi-object Tracking in Unmanned Aerial Vehicles
    Li, Jiaxin
    Li, Hongjun
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT VI, 2024, 14430 : 347 - 358
  • [40] MotionFormer: An Improved Transformer-Based Architecture for Multi-object Tracking
    Agrawal, Harshit
    Halder, Agrya
    Chattopadhyay, Pratik
    COMPUTER VISION AND IMAGE PROCESSING, CVIP 2023, PT III, 2024, 2011 : 212 - 224