PYRAMID MASKED IMAGE MODELING FOR TRANSFORMER-BASED AERIAL OBJECT DETECTION

被引：2

作者：

Zhang, Cong ^{[1
]}

Liu, Tianshan ^{[1
]}

Ju, Yakun ^{[1
]}

Lam, Kin-Man ^{[1
]}

机构：

[1] Hong Kong Polytech Univ, Dept Elect & Informat Engn, Kowloon, Hong Kong, Peoples R China

来源：

2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP | 2023年

关键词：

Vision Transformer; Masked Image Modeling; Self-Supervised Learning; Pyramid Architecture; Aerial Object Detection;

D O I：

10.1109/ICIP49359.2023.10223093

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Two obstacles, the scarcity of annotated samples and the difficulty in preserving multi-scale hierarchical representations, hinder the advancement of vision Transformer-based aerial object detection. The emergence of self-supervised learning has inspired some solutions to the first issue. However, most solutions focus on single-scale features, conflicting with solving the second issue. To bridge this gap, this paper proposes a novel pyramid masked image modeling (MIM) framework, termed PyraMIM, for self-supervised pretraining in aerial scenarios. Without manual annotation, PyraMIM enables establishing pyramid representations during pretraining, which can be seamlessly adapted to downstream aerial object detection for performance improvement. Experimental results demonstrate the effectiveness and superiority of our method.

引用

页码：1675 / 1679

页数：5

共 50 条

[31] Transformer-based contrastive learning framework for image anomaly detection
Wentao Fan
Weimin Shangguan
Yewang Chen
International Journal of Machine Learning and Cybernetics, 2023, 14 : 3413 - 3426
[32] Compositional Learning in Transformer-Based Human-Object Interaction Detection
Zhuang, Zikun
Qian, Ruihao
Xie, Chi
Liang, Shuang
2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 1038 - 1043
[33] ACT-FRCNN: Progress Towards Transformer-Based Object Detection
Zulfqar, Sukana
Elgamal, Zenab
Zia, Muhammad Azam
Razzaq, Abdul
Ullah, Sami
Dawood, Hussain
ALGORITHMS, 2024, 17 (11)
[34] Transformer-based few-shot object detection in traffic scenarios
Sun, Erjun
Zhou, Di
Tian, Yan
Xu, Zhaocheng
Wang, Xun
APPLIED INTELLIGENCE, 2024, 54 (01) : 947 - 958
[35] Dual Attention Based Image Pyramid Network for Object Detection
Dong, Xiang
Li, Feng
Bai, Huihui
Zhao, Yao
KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2021, 15 (12): : 4439 - 4455
[36] Scale Decoupled Pyramid for Object Detection in Aerial Images
Ma, You
Chai, Lin
Jin, Lizuo
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
[37] Transformer-Based Object Detection with Deep Feature Fusion Using Carafe Operator (TRCNet) in Remote Sensing Image
Chen S.
Wang B.
Zhong C.
EAI Endorsed Transactions on Energy Web, 2023, 10 : 1 - 11
[38] BOUNDARY-AWARE BIAS LOSS FOR TRANSFORMER-BASED AERIAL IMAGE SEGMENTATION MODEL
Zhang, Yan
Jiang, Xue
Liu, Siqi
Hu, Bo
Gao, Xinbo
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 3528 - 3532
[39] Salient object detection based on Pyramid Vision Transformer-gated network
Zhou, Xiaoli
Huo, Lina
Wang, Wei
Hao, Peng
JOURNAL OF ELECTRONIC IMAGING, 2024, 33 (06)
[40] Quantifying the Bias of Transformer-Based Language Models for African American English in Masked Language Modeling
Salutari, Flavia
Ramos, Jerome
Rahmani, Hossein A.
Linguaglossa, Leonardo
Lipani, Aldo
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2023, PT I, 2023, 13935 : 532 - 543

← 1 2 3 4 5 →