Deep Symmetric Fusion Transformer for Multimodal Remote Sensing Data Classification

被引:3
|
作者
Chang, Honghao [1 ]
Bi, Haixia [1 ]
Li, Fan [1 ]
Xu, Chen [2 ,3 ]
Chanussot, Jocelyn [4 ]
Hong, Danfeng [5 ,6 ]
机构
[1] Xi An Jiao Tong Univ, Sch Informat & Commun Engn, Xian 710049, Peoples R China
[2] Peng Cheng Lab, Dept Math & Fundamental Res, Shenzhen 518055, Peoples R China
[3] Xi An Jiao Tong Univ, Sch Math & Stat, Xian 710049, Peoples R China
[4] Univ Grenoble Alpes, CNRS, INRIA, Grenoble INP LJK, F-38000 Grenoble, France
[5] Chinese Acad Sci, Aerosp Informat Res Inst, Beijing 100049, Peoples R China
[6] Univ Chinese Acad Sci, Sch Elect Elect & Commun Engn, Beijing 100049, Peoples R China
关键词
Land-cover classification; local-global mixture (LGM); multimodal feature fusion; remote sensing; symmetric fusion transformer (SFT); LAND-COVER CLASSIFICATION; LIDAR DATA;
D O I
10.1109/TGRS.2024.3476975
中图分类号
P3 [地球物理学]; P59 [地球化学];
学科分类号
0708 ; 070902 ;
摘要
In recent years, multimodal remote sensing data classification (MMRSC) has evoked growing attention due to its more comprehensive and accurate delineation of Earth's surface compared to its single-modal counterpart. However, it remains challenging to capture and integrate local and global features from single-modal data. Moreover, how to fully excavate and exploit the interactions between different modalities is still an intricate issue. To this end, we propose a novel dual-branch transformer-based framework named deep symmetric fusion transformer (DSymFuser). Within the framework, each branch contains a stack of local-global mixture (LGM) blocks, to extract hierarchical and discriminative single-modal features. In each LGM block, a local-global feature mixer with learnable weights is specifically devised to adaptively aggregate the local and global features extracted with a convolutional neural network (CNN)-transformer network. Furthermore, we innovatively design a symmetric fusion transformer (SFT) that trails behind each LGM block. The elaborately designed SFT symmetrically facilitates cross-modal correlation excavation, comprehensively exploiting the complementary cues underlying heterogeneous modalities. The hierarchical construction of the LGM and SFT blocks enables feature extraction and fusion in a multilevel manner, further promoting the completeness and descriptiveness of the learned features. We conducted extensive ablation studies and comparative experiments on three benchmark datasets, and the experimental results validated the effectiveness and superiority of the proposed method. The source code of the proposed method will be available publicly at https://github.com/HaixiaBi1982/DSymFuser.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Multimodal Fusion Transformer for Remote Sensing Image Classification
    Roy, Swalpa Kumar
    Deria, Ankur
    Hong, Danfeng
    Rasti, Behnood
    Plaza, Antonio
    Chanussot, Jocelyn
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [2] A multimodal hyper-fusion transformer for remote sensing image classification
    Ma, Mengru
    Ma, Wenping
    Jiao, Licheng
    Liu, Xu
    Li, Lingling
    Feng, Zhixi
    Liu, Fang
    Yang, Shuyuan
    INFORMATION FUSION, 2023, 96 : 66 - 79
  • [3] Fractional Fourier Image Transformer for Multimodal Remote Sensing Data Classification
    Zhao, Xudong
    Zhang, Mengmeng
    Tao, Ran
    Li, Wei
    Liao, Wenzhi
    Tian, Lianfang
    Philips, Wilfried
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (02) : 2314 - 2326
  • [4] Deep Fusion of Remote Sensing Data for Accurate Classification
    Chen, Yushi
    Li, Chunyang
    Ghamisi, Pedram
    Jia, Xiuping
    Gu, Yanfeng
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2017, 14 (08) : 1253 - 1257
  • [5] Scale Adaptive Fusion Network for Multimodal Remote Sensing Data Classification
    Liu, Xiaomin
    Yu, Mengjun
    Qiao, Zhenzhuang
    Wang, Haoyu
    Xing, Changda
    Dianzi Yu Xinxi Xuebao/Journal of Electronics and Information Technology, 2024, 46 (09): : 3693 - 3702
  • [6] Deep learning in multimodal remote sensing data fusion: A comprehensive review
    Li, Jiaxin
    Hong, Danfeng
    Gao, Lianru
    Yao, Jing
    Zheng, Ke
    Zhang, Bing
    Chanussot, Jocelyn
    INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION, 2022, 112
  • [7] A Multilevel Multimodal Fusion Transformer for Remote Sensing Semantic Segmentation
    Ma, Xianping
    Zhang, Xiaokang
    Pun, Man-On
    Liu, Ming
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 15
  • [8] Deep learning decision fusion for the classification of urban remote sensing data
    Abdi, Ghasem (ghasem.abdi@ut.ac.ir), 1600, SPIE (12):
  • [9] Deep learning decision fusion for the classification of urban remote sensing data
    Abdi, Ghasem
    Samadzadegan, Farhad
    Reinartz, Peter
    JOURNAL OF APPLIED REMOTE SENSING, 2018, 12 (01):
  • [10] HGR Correlation Pooling Fusion Framework for Recognition and Classification in Multimodal Remote Sensing Data
    Zhang, Hongkang
    Huang, Shao-Lun
    Kuruoglu, Ercan Engin
    REMOTE SENSING, 2024, 16 (10)