TCNet: Multiscale Fusion of Transformer and CNN for Semantic Segmentation of Remote Sensing Images

被引：12

作者：

Xiang, Xuyang ^{[1
]}

Gong, Wenping ^{[1
]}

Li, Shuailong ^{[1
]}

Chen, Jun ^{[2
]}

Ren, Tianhe ^{[1
]}

机构：

[1] China Univ Geosci, Fac Engn, Wuhan 430074, Peoples R China

[2] China Univ Geosci, Sch Automat, Wuhan 430074, Peoples R China

来源：

IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING | 2024年 / 17卷

关键词：

Convolutional Neural Network (CNN); feature fusion; remote sensing images; semantic segmentation; Transformer;

D O I：

10.1109/JSTARS.2024.3349625

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Semantic segmentation of remote sensing images plays a critical role in areas such as urban change detection, environmental protection, and geohazard identification. Convolutional Neural Networks (CNNs) have been excessively employed for semantic segmentation over the past few years; however, a limitation of the CNN is that there exists a challenge in extracting the global context of remote sensing images, which is vital for semantic segmentation, due to the locality of the convolution operation. It is informed that the recently developed Transformer is equipped with powerful global modeling capabilities. A network called TCNet is proposed in this article, and a parallel-in-branch architecture of the Transformer and the CNN is adopted in the TCNet. As such, the TCNet takes advantage of both Transformer and CNN, and both global context and low-level spatial details could be captured in a much shallower manner. In addition, a novel fusion technique called Interactive Self-attention is advanced to fuse the multilevel features extracted from both branches. To bridge the semantic gap between regions, a skip connection module called Windowed Self-attention Gating is further developed and added to the progressive upsampling network. Experiments on three public datasets (i.e., Bijie Landslide Dataset, WHU Building Dataset, and Massachusetts Buildings Dataset) depict that TCNet yields superior performance over state-of-the-art models. The IoU values obtained by TCNet for these three datasets are 75.34% (ranked first among 10 models compared), 91.16% (ranked first among 13 models compared), and 76.21% (ranked first among 13 models compared), respectively.

引用

页码：3123 / 3136

页数：14

共 50 条

[1] CNN and Transformer Fusion for Remote Sensing Image Semantic Segmentation
Chen, Xin
Li, Dongfen
Liu, Mingzhe
Jia, Jiaru
REMOTE SENSING, 2023, 15 (18)
[2] CMTFNet: CNN and Multiscale Transformer Fusion Network for Remote-Sensing Image Semantic Segmentation
Wu, Honglin
Huang, Peng
Zhang, Min
Tang, Wenlong
Yu, Xinyu
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
[3] CMLFormer: CNN and Multiscale Local-Context Transformer Network for Remote Sensing Images Semantic Segmentation
Wu, Honglin
Zhang, Min
Huang, Peng
Tang, Wenlong
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2024, 17 : 7233 - 7241
[4] Hybrid CNN and Transformer Network for Semantic Segmentation of UAV Remote Sensing Images
Zhou X.
Zhou L.
Gong S.
Zhang H.
Zhong S.
Xia Y.
Huang Y.
IEEE Journal on Miniaturization for Air and Space Systems, 2024, 5 (01): : 33 - 41
[5] MFTransNet: A Multi-Modal Fusion with CNN-Transformer Network for Semantic Segmentation of HSR Remote Sensing Images
He, Shumeng
Yang, Houqun
Zhang, Xiaoying
Li, Xuanyu
MATHEMATICS, 2023, 11 (03)
[6] MCAFNet: A Multiscale Channel Attention Fusion Network for Semantic Segmentation of Remote Sensing Images
Yuan, Min
Ren, Dingbang
Feng, Qisheng
Wang, Zhaobin
Dong, Yongkang
Lu, Fuxiang
Wu, Xiaolin
REMOTE SENSING, 2023, 15 (02)
[7] CTFNet: CNN-Transformer Fusion Network for Remote-Sensing Image Semantic Segmentation
Wu, Honglin
Huang, Peng
Zhang, Min
Tang, Wenlong
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21 : 1 - 5
[8] SEMANTIC SEGMENTATION FOR REMOTE SENSING IMAGES BASED ON SWIN-TRANSFORMER AND MULTISCALE FEATURE REFINEMENT
Zhu, Shengyu
IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 6370 - 6373
[9] A Multilevel Multimodal Fusion Transformer for Remote Sensing Semantic Segmentation
Ma, Xianping
Zhang, Xiaokang
Pun, Man-On
Liu, Ming
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 15
[10] Enhancing Multiscale Representations With Transformer for Remote Sensing Image Semantic Segmentation
Xiao, Tao
Liu, Yikun
Huang, Yuwen
Li, Mingsong
Yang, Gongping
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61

← 1 2 3 4 5 →