Swin Transformer Embedding UNet for Remote Sensing Image Semantic Segmentation

被引:398
|
作者
He, Xin [1 ,2 ]
Zhou, Yong [1 ,2 ]
Zhao, Jiaqi [1 ,2 ]
Zhang, Di [1 ,2 ]
Yao, Rui [1 ,2 ]
Xue, Yong [3 ,4 ]
机构
[1] China Univ Min & Technol, Sch Comp Sci & Technol, Xuzhou 221116, Jiangsu, Peoples R China
[2] Minist Educ Peoples Republ China, Engn Res Ctr Mine Digitizat, Xuzhou 221116, Jiangsu, Peoples R China
[3] China Univ Min & Technol, Sch Environm Sci & Spatial Informat, Xuzhou 221116, Jiangsu, Peoples R China
[4] Univ Derby, Sch Elect Comp & Math, Derby DE22 1GB, England
基金
中国国家自然科学基金;
关键词
Transformers; Semantics; Image segmentation; Feature extraction; Convolutional neural networks; Remote sensing; Task analysis; Global information embedding; remote sensing (RS); semantic segmentation; Swin transformer; CLASSIFICATION; RECOGNITION;
D O I
10.1109/TGRS.2022.3144165
中图分类号
P3 [地球物理学]; P59 [地球化学];
学科分类号
0708 ; 070902 ;
摘要
Global context information is essential for the semantic segmentation of remote sensing (RS) images. However, most existing methods rely on a convolutional neural network (CNN), which is challenging to directly obtain the global context due to the locality of the convolution operation. Inspired by the Swin transformer with powerful global modeling capabilities, we propose a novel semantic segmentation framework for RS images called ST-U-shaped network (UNet), which embeds the Swin transformer into the classical CNN-based UNet. ST-UNet constitutes a novel dual encoder structure of the Swin transformer and CNN in parallel. First, we propose a spatial interaction module (SIM), which encodes spatial information in the Swin transformer block by establishing pixel-level correlation to enhance the feature representation ability of occluded objects. Second, we construct a feature compression module (FCM) to reduce the loss of detailed information and condense more small-scale features in patch token downsampling of the Swin transformer, which improves the segmentation accuracy of small-scale ground objects. Finally, as a bridge between dual encoders, a relational aggregation module (RAM) is designed to integrate global dependencies from the Swin transformer into the features from CNN hierarchically. Our ST-UNet brings significant improvement on the ISPRS-Vaihingen and Potsdam datasets, respectively. The code will be available at <uri>https://github.com/XinnHe/ST-UNet</uri>.
引用
收藏
页数:15
相关论文
共 50 条
  • [21] MCST-UNET: HIGH-RESOLUTION SEMANTIC SEGMENTATION WITH SWIN TRANSFORMER MODULES
    Guo, Hongwei
    Ma, Chuang
    Xu, Guangxia
    Liu, Pengcheng
    JOURNAL OF NONLINEAR AND CONVEX ANALYSIS, 2024, 25 (06) : 1313 - 1323
  • [22] SUNet: Swin Transformer UNet for Image Denoising
    Fan, Chi-Mao
    Liu, Tsung-Jung
    Liu, Kuan-Hsien
    2022 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS 22), 2022, : 2333 - 2337
  • [23] Csswin-unet: a Swin-unet network for semantic segmentation of remote sensing images by aggregating contextual information and extracting spatial information
    Xiao, Dong
    Kang, Zhihao
    Fu, Yanhua
    Li, Zhenni
    Ran, Mengying
    INTERNATIONAL JOURNAL OF REMOTE SENSING, 2023, 44 (23) : 7598 - 7625
  • [24] UNet-like network fused swin transformer and CNN for semantic image synthesis
    Ke, Aihua
    Luo, Jian
    Cai, Bo
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [25] UNetFormer: A UNet-like transformer for efficient semantic segmentation of remote sensing urban scene imagery
    Wang, Libo
    Li, Rui
    Zhang, Ce
    Fang, Shenghui
    Duan, Chenxi
    Meng, Xiaoliang
    Atkinson, Peter M.
    ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2022, 190 : 196 - 214
  • [26] MMT: Mixed-Mask Transformer for Remote Sensing Image Semantic Segmentation
    Xu, Zhe
    Geng, Jie
    Jiang, Wen
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [27] Hybrid Attention Fusion Embedded in Transformer for Remote Sensing Image Semantic Segmentation
    Chen, Yan
    Dong, Quan
    Wang, Xiaofeng
    Zhang, Qianchuan
    Kang, Menglei
    Jiang, Wenxiang
    Wang, Mengyuan
    Xu, Lixiang
    Zhang, Chen
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2024, 17 : 4421 - 4435
  • [28] SwinE-UNet3+: swin transformer encoder network for medical image segmentation
    Ping Zou
    Jian-Sheng Wu
    Progress in Artificial Intelligence, 2023, 12 : 99 - 105
  • [29] SwinE-UNet3+: swin transformer encoder network for medical image segmentation
    Zou, Ping
    Wu, Jian-Sheng
    PROGRESS IN ARTIFICIAL INTELLIGENCE, 2023, 12 (01) : 99 - 105
  • [30] Efficient Transformer for Remote Sensing Image Segmentation
    Xu, Zhiyong
    Zhang, Weicun
    Zhang, Tianxiang
    Yang, Zhifang
    Li, Jiangyun
    REMOTE SENSING, 2021, 13 (18)