A cross-attention integrated shifted window transformer for remote sensing image scene recognition with limited data

被引：0

作者：

Li, Kaiyuan ^{[1
]}

Xue, Yong ^{[1
]}

Zhao, Jiaqi ^{[2
]}

Li, Honghao ^{[1
]}

Zhang, Sheng ^{[1
]}

机构：

[1] China Univ Min & Technol, Sch Environm Sci & Spatial Informat, Xuzhou, Peoples R China

[2] China Univ Min & Technol, Sch Comp Sci & Technol, Xuzhou, Peoples R China

来源：

JOURNAL OF APPLIED REMOTE SENSING | 2024年 / 18卷 / 03期

基金：

中国国家自然科学基金;

关键词：

recognition; scene classification; insufficient data; remote sensing; deep learning; cross attention; CLASSIFICATION; NETWORK; TREE;

D O I：

10.1117/1.JRS.18.036506

中图分类号：

X [环境科学、安全科学];

学科分类号：

08 ; 0830 ;

摘要：

The aim of remote sensing image scene recognition is to label a set of semantic categories based on their contents, and recognition for remote sensing images has a wide range of applications in many fields. However, it is a great challenge to extract category features with insufficiently labeled samples. We propose a Multi-scale Shift-window Cross-attention Vision Transformer (MSC-ViT) framework for remote sensing image scene recognition with limited data. Specifically, the proposed model is composed of three modules: a multi-scale feature extraction module, a shift-window transformer module, and a multi-scale cross-attention module. First, to enhance the efficiency of data utilization, we design a multi-scale module to fully extract the features of object information and spatial information contained in the image. The hierarchical transformer structure based on shifted windows, which are flexible at different scales, could match the computation of multi-scale features. The token fusion method based on the cross-attention mechanism fuses the features between multi-branch tokens and class tokens, which fully learn the information of the tokens and achieve better classification results. In addition, we integrate existing open-source datasets of remote sensing images and form a new dataset to better apply to the scene recognition task of remote sensing images with limited data. Our experimental results show that the proposed method achieves a great performance in scene classification of remote sensing images with limited data. The top-1 accuracy of the developed method is 79.84% with a 20% training ratio, 84.78% with a 40% training ratio, 89.79% with a 60% training ratio, and 91.43% with an 80% training ratio.

引用

页数：19

共 50 条

[41] CAF-ViT: A cross-attention based Transformer network for underwater acoustic target recognition
Dong, Wenfeng
Fu, Jin
Zou, Nan
Zhao, Chunpeng
Miao, Yixin
Shen, Zheng
OCEAN ENGINEERING, 2025, 318
[42] Dual-Branch Cross-Attention Network for Micro-Expression Recognition with Transformer Variants
Xie, Zhihua
Zhao, Chuwei
ELECTRONICS, 2024, 13 (02)
[43] Cross-attention interaction learning network for multi-model image fusion via transformer
Wang, Jing
Yu, Long
Tian, Shengwei
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2025, 139
[44] A Novel Transformer Network with a CNN-Enhanced Cross-Attention Mechanism for Hyperspectral Image Classification
Wang, Xinyu
Sun, Le
Lu, Chuhan
Li, Baozhu
REMOTE SENSING, 2024, 16 (07)
[45] Interactive CNN and Transformer-Based Cross-Attention Fusion Network for Medical Image Classification
Cai, Shu
Zhang, Qiude
Wang, Shanshan
Hu, Junjie
Zeng, Liang
Li, Kaiyan
INTERNATIONAL JOURNAL OF IMAGING SYSTEMS AND TECHNOLOGY, 2025, 35 (03)
[46] Improved Attention Mechanism and Residual Network for Remote Sensing Image Scene Classification
Kong, Jiayuan
Gao, Yurong
Zhang, Yanjun
Lei, Huimin
Wang, Yao
Zhang, Hesheng
IEEE ACCESS, 2021, 9 : 134800 - 134808
[47] Combining Multilevel Features for Remote Sensing Image Scene Classification With Attention Model
Ji, Jinsheng
Zhang, Tao
Jiang, Linfeng
Zhong, Weilin
Xiong, Huilin
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2020, 17 (09) : 1647 - 1651
[48] Remote Sensing Image Scene Classification Based on Multidimensional Attention and Feature Enhancement
Liu, Chengrui
Dai, Hong
Wang, Shuang
Chen, Junhong
IAENG International Journal of Computer Science, 2023, 50 (04)
[49] A Multiscale Cascaded Cross-Attention Hierarchical Network for Change Detection on Bitemporal Remote Sensing Images
Zhang, Xiaofeng
Wang, Liejun
Cheng, Shuli
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 16
[50] Attention-Rectified and Texture-Enhanced Cross-Attention Transformer Feature Fusion Network for Facial Expression Recognition
Sun, Mingyi
Cui, Weigang
Zhang, Yue
Yu, Shuyue
Liao, Xiaofeng
Hu, Bin
Li, Yang
IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2023, 19 (12) : 11823 - 11832

← 1 2 3 4 5 →