A cross-attention integrated shifted window transformer for remote sensing image scene recognition with limited data

被引:0
|
作者
Li, Kaiyuan [1 ]
Xue, Yong [1 ]
Zhao, Jiaqi [2 ]
Li, Honghao [1 ]
Zhang, Sheng [1 ]
机构
[1] China Univ Min & Technol, Sch Environm Sci & Spatial Informat, Xuzhou, Peoples R China
[2] China Univ Min & Technol, Sch Comp Sci & Technol, Xuzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
recognition; scene classification; insufficient data; remote sensing; deep learning; cross attention; CLASSIFICATION; NETWORK; TREE;
D O I
10.1117/1.JRS.18.036506
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
The aim of remote sensing image scene recognition is to label a set of semantic categories based on their contents, and recognition for remote sensing images has a wide range of applications in many fields. However, it is a great challenge to extract category features with insufficiently labeled samples. We propose a Multi-scale Shift-window Cross-attention Vision Transformer (MSC-ViT) framework for remote sensing image scene recognition with limited data. Specifically, the proposed model is composed of three modules: a multi-scale feature extraction module, a shift-window transformer module, and a multi-scale cross-attention module. First, to enhance the efficiency of data utilization, we design a multi-scale module to fully extract the features of object information and spatial information contained in the image. The hierarchical transformer structure based on shifted windows, which are flexible at different scales, could match the computation of multi-scale features. The token fusion method based on the cross-attention mechanism fuses the features between multi-branch tokens and class tokens, which fully learn the information of the tokens and achieve better classification results. In addition, we integrate existing open-source datasets of remote sensing images and form a new dataset to better apply to the scene recognition task of remote sensing images with limited data. Our experimental results show that the proposed method achieves a great performance in scene classification of remote sensing images with limited data. The top-1 accuracy of the developed method is 79.84% with a 20% training ratio, 84.78% with a 40% training ratio, 89.79% with a 60% training ratio, and 91.43% with an 80% training ratio.
引用
收藏
页数:19
相关论文
共 50 条
  • [41] CAF-ViT: A cross-attention based Transformer network for underwater acoustic target recognition
    Dong, Wenfeng
    Fu, Jin
    Zou, Nan
    Zhao, Chunpeng
    Miao, Yixin
    Shen, Zheng
    OCEAN ENGINEERING, 2025, 318
  • [42] Dual-Branch Cross-Attention Network for Micro-Expression Recognition with Transformer Variants
    Xie, Zhihua
    Zhao, Chuwei
    ELECTRONICS, 2024, 13 (02)
  • [43] Cross-attention interaction learning network for multi-model image fusion via transformer
    Wang, Jing
    Yu, Long
    Tian, Shengwei
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2025, 139
  • [44] A Novel Transformer Network with a CNN-Enhanced Cross-Attention Mechanism for Hyperspectral Image Classification
    Wang, Xinyu
    Sun, Le
    Lu, Chuhan
    Li, Baozhu
    REMOTE SENSING, 2024, 16 (07)
  • [45] Interactive CNN and Transformer-Based Cross-Attention Fusion Network for Medical Image Classification
    Cai, Shu
    Zhang, Qiude
    Wang, Shanshan
    Hu, Junjie
    Zeng, Liang
    Li, Kaiyan
    INTERNATIONAL JOURNAL OF IMAGING SYSTEMS AND TECHNOLOGY, 2025, 35 (03)
  • [46] Improved Attention Mechanism and Residual Network for Remote Sensing Image Scene Classification
    Kong, Jiayuan
    Gao, Yurong
    Zhang, Yanjun
    Lei, Huimin
    Wang, Yao
    Zhang, Hesheng
    IEEE ACCESS, 2021, 9 : 134800 - 134808
  • [47] Combining Multilevel Features for Remote Sensing Image Scene Classification With Attention Model
    Ji, Jinsheng
    Zhang, Tao
    Jiang, Linfeng
    Zhong, Weilin
    Xiong, Huilin
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2020, 17 (09) : 1647 - 1651
  • [48] Remote Sensing Image Scene Classification Based on Multidimensional Attention and Feature Enhancement
    Liu, Chengrui
    Dai, Hong
    Wang, Shuang
    Chen, Junhong
    IAENG International Journal of Computer Science, 2023, 50 (04)
  • [49] A Multiscale Cascaded Cross-Attention Hierarchical Network for Change Detection on Bitemporal Remote Sensing Images
    Zhang, Xiaofeng
    Wang, Liejun
    Cheng, Shuli
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 16
  • [50] Attention-Rectified and Texture-Enhanced Cross-Attention Transformer Feature Fusion Network for Facial Expression Recognition
    Sun, Mingyi
    Cui, Weigang
    Zhang, Yue
    Yu, Shuyue
    Liao, Xiaofeng
    Hu, Bin
    Li, Yang
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2023, 19 (12) : 11823 - 11832