Transformer-Based Semantic Segmentation for Extraction of Building Footprints from Very-High-Resolution Images

被引：6

作者：

Song, Jia ^{[1
,3
]}

Zhu, A-Xing ^{[1
,2
]}

Zhu, Yunqiang ^{[1
]}

机构：

[1] Chinese Acad Sci, Inst Geog Sci & Nat Resources Res, State Key Lab Resources & Environm Informat Syst, Beijing 100101, Peoples R China

[2] Univ Wisconsin, Dept Geog, Madison, WI 53706 USA

[3] Jiangsu Ctr Collaborat Innovat Geog Informat Resou, Nanjing 210023, Peoples R China

来源：

SENSORS | 2023年 / 23卷 / 11期

关键词：

vision transformer; hyperparameter; building; self-attention; deep learning; CLASSIFICATION; NETWORK;

D O I：

10.3390/s23115166

中图分类号：

O65 [分析化学];

学科分类号：

070302 ; 081704 ;

摘要：

Semantic segmentation with deep learning networks has become an important approach to the extraction of objects from very high-resolution remote sensing images. Vision Transformer networks have shown significant improvements in performance compared to traditional convolutional neural networks (CNNs) in semantic segmentation. Vision Transformer networks have different architectures to CNNs. Image patches, linear embedding, and multi-head self-attention (MHSA) are several of the main hyperparameters. How we should configure them for the extraction of objects in VHR images and how they affect the accuracy of networks are topics that have not been sufficiently investigated. This article explores the role of vision Transformer networks in the extraction of building footprints from very-high-resolution (VHR) images. Transformer-based models with different hyperparameter values were designed and compared, and their impact on accuracy was analyzed. The results show that smaller image patches and higher-dimension embeddings result in better accuracy. In addition, the Transformer-based network is shown to be scalable and can be trained with general-scale graphics processing units (GPUs) with comparable model sizes and training times to convolutional neural networks while achieving higher accuracy. The study provides valuable insights into the potential of vision Transformer networks in object extraction using VHR images.

引用

页数：19

共 50 条

[41] Building Footprint Extraction from Very-High-Resolution Satellite Image Using Object-Based Image Analysis (OBIA) Technique
Prathiba, A. P.
Rastogi, Kriti
Jain, Gaurav, V
Kumar, V. V. Govind
APPLICATIONS OF GEOMATICS IN CIVIL ENGINEERING, 2020, 33 : 517 - 529
[42] Road Extraction from Very-High-Resolution Remote Sensing Images via a Nested SE-Deeplab Model
Lin, Yeneng
Xu, Dongyun
Wang, Nan
Shi, Zhou
Chen, Qiuxiao
REMOTE SENSING, 2020, 12 (18)
[43] Contextually guided very-high-resolution imagery classification with semantic segments
Zhao, Wenzhi
Du, Shihong
Wang, Qiao
Emery, William J.
ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2017, 132 : 48 - 60
[44] Extraction of road blockage information for the Jiuzhaigou earthquake based on a convolution neural network and very-high-resolution satellite images
Baolin Yang
Shixin Wang
Yi Zhou
Futao Wang
Qiao Hu
Ying Chang
Qing Zhao
Earth Science Informatics, 2020, 13 : 115 - 127
[45] A Transformer-based multi-modal fusion network for semantic segmentation of high-resolution remote sensing imagery
Liu, Yutong
Gao, Kun
Wang, Hong
Yang, Zhijia
Wang, Pengyu
Ji, Shijing
Huang, Yanjun
Zhu, Zhenyu
Zhao, Xiaobin
INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION, 2024, 133
[46] TRANSFORMER-BASED METHOD FOR SEMANTIC SEGMENTATION AND RECONSTRUCTION OF THE MARTIAN SURFACE
Li, Z.
Wu, B.
Chen, Z.
Ma, Y.
GEOSPATIAL WEEK 2023, VOL. 48-1, 2023, : 1643 - 1649
[47] SEMANTIC SEGMENTATION OF HIGH-RESOLUTION REMOTE SENSING IMAGES USING AN IMPROVED TRANSFORMER
Liu, Yuheng
Mei, Shaohui
Zhang, Shun
Wang, Ye
He, Mingyi
Du, Qian
2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022), 2022, : 3496 - 3499
[48] Very-High-Resolution SAR Images and Linked Open Data Analytics Based on Ontologies
Espinoza-Molina, Daniela
Nikolaou, Charalampos
Dumitru, Corneliu Octavian
Bereta, Konstantina
Koubarakis, Manolis
Schwarz, Gottfried
Datcu, Mihai
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2015, 8 (04) : 1696 - 1708
[49] Transformer-based Detection of Microorganisms on High-Resolution Petri Dish Images
Ebert, Nikolas
Stricker, Didier
Wasenmueller, Oliver
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, : 3963 - 3972
[50] A Weakly Supervised Semantic Segmentation Approach for Damaged Building Extraction From Postearthquake High-Resolution Remote-Sensing Images
Qiao, Wenfan
Shen, Li
Wang, Jicheng
Yang, Xiaotian
Li, Zhilin
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2023, 20

← 1 2 3 4 5 →