Transformer-Based Semantic Segmentation for Extraction of Building Footprints from Very-High-Resolution Images

被引:6
|
作者
Song, Jia [1 ,3 ]
Zhu, A-Xing [1 ,2 ]
Zhu, Yunqiang [1 ]
机构
[1] Chinese Acad Sci, Inst Geog Sci & Nat Resources Res, State Key Lab Resources & Environm Informat Syst, Beijing 100101, Peoples R China
[2] Univ Wisconsin, Dept Geog, Madison, WI 53706 USA
[3] Jiangsu Ctr Collaborat Innovat Geog Informat Resou, Nanjing 210023, Peoples R China
关键词
vision transformer; hyperparameter; building; self-attention; deep learning; CLASSIFICATION; NETWORK;
D O I
10.3390/s23115166
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Semantic segmentation with deep learning networks has become an important approach to the extraction of objects from very high-resolution remote sensing images. Vision Transformer networks have shown significant improvements in performance compared to traditional convolutional neural networks (CNNs) in semantic segmentation. Vision Transformer networks have different architectures to CNNs. Image patches, linear embedding, and multi-head self-attention (MHSA) are several of the main hyperparameters. How we should configure them for the extraction of objects in VHR images and how they affect the accuracy of networks are topics that have not been sufficiently investigated. This article explores the role of vision Transformer networks in the extraction of building footprints from very-high-resolution (VHR) images. Transformer-based models with different hyperparameter values were designed and compared, and their impact on accuracy was analyzed. The results show that smaller image patches and higher-dimension embeddings result in better accuracy. In addition, the Transformer-based network is shown to be scalable and can be trained with general-scale graphics processing units (GPUs) with comparable model sizes and training times to convolutional neural networks while achieving higher accuracy. The study provides valuable insights into the potential of vision Transformer networks in object extraction using VHR images.
引用
收藏
页数:19
相关论文
共 50 条
  • [1] Transformer-based semantic segmentation for large-scale building footprint extraction from very-high resolution satellite images
    Gibril, Mohamed Barakat A.
    Al-Ruzouq, Rami
    Shanableh, Abdallah
    Jena, Ratiranjan
    Bolcek, Jan
    Shafri, Helmi Zulhaidi Mohd
    Ghorbanzadeh, Omid
    ADVANCES IN SPACE RESEARCH, 2024, 73 (10) : 4937 - 4954
  • [2] Building Extraction from Very-High-Resolution Remote Sensing Images Using Semi-Supervised Semantic Edge Detection
    Xia, Liegang
    Zhang, Xiongbo
    Zhang, Junxia
    Yang, Haiping
    Chen, Tingting
    REMOTE SENSING, 2021, 13 (11)
  • [3] Multiconstraint Transformer-Based Automatic Building Extraction From High-Resolution Remote Sensing Images
    Yuan, Wei
    Ran, Weihang
    Shi, Xiaodan
    Shibasaki, Ryosuke
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2023, 16 : 9164 - 9174
  • [4] PGNet: Positioning Guidance Network for Semantic Segmentation of Very-High-Resolution Remote Sensing Images
    Liu, Bo
    Hu, Jinwu
    Bi, Xiuli
    Li, Weisheng
    Gao, Xinbo
    REMOTE SENSING, 2022, 14 (17)
  • [5] A Transformer-based Semantic Segmentation Model for Street Fashion Images
    Peng, Dingjie
    Kameyama, Wataru
    INTERNATIONAL WORKSHOP ON ADVANCED IMAGING TECHNOLOGY, IWAIT 2023, 2023, 12592
  • [6] Transformer and CNN Hybrid Deep Neural Network for Semantic Segmentation of Very-High-Resolution Remote Sensing Imagery
    Zhang, Cheng
    Jiang, Wanshou
    Zhang, Yuan
    Wang, Wei
    Zhao, Qing
    Wang, Chenjie
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [7] Cascaded Attention DenseUNet (CADUNet) for Road Extraction from Very-High-Resolution Images
    Li, Jing
    Liu, Yong
    Zhang, Yindan
    Zhang, Yang
    ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2021, 10 (05)
  • [8] Semantic Segmentation of Very-High-Resolution Remote Sensing Images via Deep Multi-Feature Learning
    Su, Yanzhou
    Cheng, Jian
    Bai, Haiwei
    Liu, Haijun
    He, Changtao
    REMOTE SENSING, 2022, 14 (03)
  • [9] Transformer-Based Decoder Designs for Semantic Segmentation on Remotely Sensed Images
    Panboonyuen, Teerapong
    Jitkajornwanich, Kulsawasd
    Lawawirojwong, Siam
    Srestasathiern, Panu
    Vateekul, Peerapon
    REMOTE SENSING, 2021, 13 (24)
  • [10] Geoscene-based Vehicle Detection from Very-high-resolution Images
    Shu, Mi
    Du, Shihong
    2016 4rth International Workshop on Earth Observation and Remote Sensing Applications (EORSA), 2016,