A cross-attention integrated shifted window transformer for remote sensing image scene recognition with limited data

被引:0
|
作者
Li, Kaiyuan [1 ]
Xue, Yong [1 ]
Zhao, Jiaqi [2 ]
Li, Honghao [1 ]
Zhang, Sheng [1 ]
机构
[1] China Univ Min & Technol, Sch Environm Sci & Spatial Informat, Xuzhou, Peoples R China
[2] China Univ Min & Technol, Sch Comp Sci & Technol, Xuzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
recognition; scene classification; insufficient data; remote sensing; deep learning; cross attention; CLASSIFICATION; NETWORK; TREE;
D O I
10.1117/1.JRS.18.036506
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
The aim of remote sensing image scene recognition is to label a set of semantic categories based on their contents, and recognition for remote sensing images has a wide range of applications in many fields. However, it is a great challenge to extract category features with insufficiently labeled samples. We propose a Multi-scale Shift-window Cross-attention Vision Transformer (MSC-ViT) framework for remote sensing image scene recognition with limited data. Specifically, the proposed model is composed of three modules: a multi-scale feature extraction module, a shift-window transformer module, and a multi-scale cross-attention module. First, to enhance the efficiency of data utilization, we design a multi-scale module to fully extract the features of object information and spatial information contained in the image. The hierarchical transformer structure based on shifted windows, which are flexible at different scales, could match the computation of multi-scale features. The token fusion method based on the cross-attention mechanism fuses the features between multi-branch tokens and class tokens, which fully learn the information of the tokens and achieve better classification results. In addition, we integrate existing open-source datasets of remote sensing images and form a new dataset to better apply to the scene recognition task of remote sensing images with limited data. Our experimental results show that the proposed method achieves a great performance in scene classification of remote sensing images with limited data. The top-1 accuracy of the developed method is 79.84% with a 20% training ratio, 84.78% with a 40% training ratio, 89.79% with a 60% training ratio, and 91.43% with an 80% training ratio.
引用
收藏
页数:19
相关论文
共 50 条
  • [1] A cross-attention integrated shifted window transformer for remote sensing image scene recognition with limited data
    Li, Kaiyuan
    Xue, Yong
    Zhao, Jiaqi
    Li, Honghao
    Zhang, Sheng
    Journal of Applied Remote Sensing, 1600, 18 (03):
  • [2] A Novel Transformer Network With Shifted Window Cross-Attention for Spatiotemporal Weather Forecasting
    Bojesomo, Alabi
    Almarzouqi, Hasan
    Liatsis, Panos
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2024, 17 : 45 - 55
  • [3] Remote sensing image change detection based on swin transformer and cross-attention mechanism
    Yan, Weidong
    Cao, Li
    Yan, Pei
    Zhu, Chaosheng
    Wang, Mengtian
    EARTH SCIENCE INFORMATICS, 2025, 18 (01)
  • [4] Multiscale Sparse Cross-Attention Network for Remote Sensing Scene Classification
    Ma, Jingjing
    Jiang, Wei
    Tang, Xu
    Zhang, Xiangrong
    Liu, Fang
    Jiao, Licheng
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2025, 63
  • [5] Remote Sensing Image Classification Based on a Cross-Attention Mechanism and Graph Convolution
    Cai, Weiwei
    Wei, Zhanguo
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
  • [6] Deformable Cross-Attention Transformer for Medical Image Registration
    Chen, Junyu
    Liu, Yihao
    He, Yufan
    Du, Yong
    MACHINE LEARNING IN MEDICAL IMAGING, MLMI 2023, PT I, 2024, 14348 : 115 - 125
  • [7] MCADNet: A Multi-Scale Cross-Attention Network for Remote Sensing Image Dehazing
    Tao, Tao
    Xu, Haoran
    Guan, Xin
    Zhou, Hao
    MATHEMATICS, 2024, 12 (23)
  • [8] Optical remote sensing image salient object detection via bidirectional cross-attention and attention restoration
    Gu, Yubin
    Chen, Siting
    Sun, Xiaoshuai
    Ji, Jiayi
    Zhou, Yiyi
    Ji, Rongrong
    PATTERN RECOGNITION, 2025, 164
  • [9] Optimization-Inspired Cross-Attention Transformer for Compressive Sensing
    Song, Jiechong
    Mou, Chong
    Wang, Shiqi
    Ma, Siwei
    Zhang, Jian
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 6174 - 6184
  • [10] Multimodal Personality Recognition using Cross-attention Transformer and Behaviour Encoding
    Agrawal, Tanay
    Agarwal, Dhruv
    Balazia, Michal
    Sinha, Neelabh
    Bremond, Francois
    PROCEEDINGS OF THE 17TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS (VISAPP), VOL 5, 2022, : 501 - 508