Towards Efficient Use of Multi-Scale Features in Transformer-Based Object Detectors

被引:14
|
作者
Zhang, Gongjie [1 ,2 ]
Luo, Zhipeng [1 ,3 ]
Tian, Zichen [1 ]
Zhang, Jingyi [1 ]
Zhang, Xiaoqin [4 ]
Lu, Shijian [1 ]
机构
[1] Nanyang Technol Univ, S Lab, Singapore, Singapore
[2] Black Sesame Technol, Singapore, Singapore
[3] SenseTime Res, Hong Kong, Peoples R China
[4] Wenzhou Univ, Wenzhou, Peoples R China
关键词
D O I
10.1109/CVPR52729.2023.00601
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multi-scale features have been proven highly effective for object detection but often come with huge and even prohibitive extra computation costs, especially for the recent Transformer-based detectors. In this paper, we propose Iterative Multi-scale Feature Aggregation (IMFA) - a generic paradigm that enables efficient use of multi-scale features in Transformer-based object detectors. The core idea is to exploit sparse multi-scale features from just a few crucial locations, and it is achieved with two novel designs. First, IMFA rearranges the Transformer encoder-decoder pipeline so that the encoded features can be iteratively updated based on the detection predictions. Second, IMFA sparsely samples scale-adaptive features for refined detection from just a few keypoint locations under the guidance of prior detection predictions. As a result, the sampled multi-scale features are sparse yet still highly beneficial for object detection. Extensive experiments show that the proposed IMFA boosts the performance of multiple Transformer-based object detectors significantly yet with only slight computational overhead.
引用
收藏
页码:6206 / 6216
页数:11
相关论文
共 50 条
  • [1] Transformer-based Multi-scale Underwater Image Enhancement Network
    Yang, Ai-Ping
    Fang, Si-Jie
    Shao, Ming-Fu
    Zhang, Teng-Fei
    Dongbei Daxue Xuebao/Journal of Northeastern University, 2024, 45 (12): : 1696 - 1705
  • [2] MULTI-SCALE TRANSFORMER-BASED FEATURE COMBINATION FOR IMAGE RETRIEVAL
    Roig Mari, Carlos
    Varas Gonzalez, David
    Bou-Balust, Elisenda
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 3166 - 3170
  • [3] A Transformer-based method to simulate multi-scale soil moisture
    Liu, Yangxiaoyue
    Xin, Ying
    Yin, Cong
    JOURNAL OF HYDROLOGY, 2025, 655
  • [4] MUSTER: A Multi-Scale Transformer-Based Decoder for Semantic Segmentation
    Xu, Jing
    Shi, Wentao
    Gao, Pan
    Li, Qizhu
    Wang, Zhengwei
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2025, 9 (01): : 202 - 212
  • [5] TransMF: Transformer-Based Multi-Scale Fusion Model for Crack Detection
    Ju, Xiaochen
    Zhao, Xinxin
    Qian, Shengsheng
    MATHEMATICS, 2022, 10 (13)
  • [6] ScaleFormer: Transformer-based speech enhancement in the multi-scale time domain
    Wu, Tianci
    He, Shulin
    Zhang, Hui
    Zhang, XueLiang
    2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 2448 - 2453
  • [7] Feature Aggregated Queries for Transformer-based Video Object Detectors
    Cui, Yiming
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 6365 - 6376
  • [8] Transformer-Based Multi-Scale Feature Remote Sensing Image Classification Model
    Sun, Ting
    Li, Jun
    Zhou, Xiangrui
    Chen, Zan
    IEEE ACCESS, 2025, 13 : 34095 - 34104
  • [9] Transformer-Based Multi-Scale Feature Integration Network for Video Saliency Prediction
    Zhou, Xiaofei
    Wu, Songhe
    Shi, Ran
    Zheng, Bolun
    Wang, Shuai
    Yin, Haibing
    Zhang, Jiyong
    Yan, Chenggang
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (12) : 7696 - 7707
  • [10] Multi-scale Feature Fusion Object Detection Based on Swin Transformer
    Zhang, Ying
    Wu, Lin
    Deng, Huaxuan
    Hu, Jun
    Li, Xifan
    39TH YOUTH ACADEMIC ANNUAL CONFERENCE OF CHINESE ASSOCIATION OF AUTOMATION, YAC 2024, 2024, : 1982 - 1987