Deep Symmetric Fusion Transformer for Multimodal Remote Sensing Data Classification

被引:3
|
作者
Chang, Honghao [1 ]
Bi, Haixia [1 ]
Li, Fan [1 ]
Xu, Chen [2 ,3 ]
Chanussot, Jocelyn [4 ]
Hong, Danfeng [5 ,6 ]
机构
[1] Xi An Jiao Tong Univ, Sch Informat & Commun Engn, Xian 710049, Peoples R China
[2] Peng Cheng Lab, Dept Math & Fundamental Res, Shenzhen 518055, Peoples R China
[3] Xi An Jiao Tong Univ, Sch Math & Stat, Xian 710049, Peoples R China
[4] Univ Grenoble Alpes, CNRS, INRIA, Grenoble INP LJK, F-38000 Grenoble, France
[5] Chinese Acad Sci, Aerosp Informat Res Inst, Beijing 100049, Peoples R China
[6] Univ Chinese Acad Sci, Sch Elect Elect & Commun Engn, Beijing 100049, Peoples R China
关键词
Land-cover classification; local-global mixture (LGM); multimodal feature fusion; remote sensing; symmetric fusion transformer (SFT); LAND-COVER CLASSIFICATION; LIDAR DATA;
D O I
10.1109/TGRS.2024.3476975
中图分类号
P3 [地球物理学]; P59 [地球化学];
学科分类号
0708 ; 070902 ;
摘要
In recent years, multimodal remote sensing data classification (MMRSC) has evoked growing attention due to its more comprehensive and accurate delineation of Earth's surface compared to its single-modal counterpart. However, it remains challenging to capture and integrate local and global features from single-modal data. Moreover, how to fully excavate and exploit the interactions between different modalities is still an intricate issue. To this end, we propose a novel dual-branch transformer-based framework named deep symmetric fusion transformer (DSymFuser). Within the framework, each branch contains a stack of local-global mixture (LGM) blocks, to extract hierarchical and discriminative single-modal features. In each LGM block, a local-global feature mixer with learnable weights is specifically devised to adaptively aggregate the local and global features extracted with a convolutional neural network (CNN)-transformer network. Furthermore, we innovatively design a symmetric fusion transformer (SFT) that trails behind each LGM block. The elaborately designed SFT symmetrically facilitates cross-modal correlation excavation, comprehensively exploiting the complementary cues underlying heterogeneous modalities. The hierarchical construction of the LGM and SFT blocks enables feature extraction and fusion in a multilevel manner, further promoting the completeness and descriptiveness of the learned features. We conducted extensive ablation studies and comparative experiments on three benchmark datasets, and the experimental results validated the effectiveness and superiority of the proposed method. The source code of the proposed method will be available publicly at https://github.com/HaixiaBi1982/DSymFuser.
引用
收藏
页数:15
相关论文
共 50 条
  • [21] Unified multimodal fusion transformer for few shot object detection for remote sensing images
    Azeem, Abdullah
    Li, Zhengzhou
    Siddique, Abubakar
    Zhang, Yuting
    Zhou, Shangbo
    INFORMATION FUSION, 2024, 111
  • [22] Transformer-based contrastive prototypical clustering for multimodal remote sensing data
    Cai, Yaoming
    Zhang, Zijia
    Ghamisi, Pedram
    Rasti, Behnood
    Liu, Xiaobo
    Cai, Zhihua
    INFORMATION SCIENCES, 2023, 649
  • [23] Multiresolution Multimodal Sensor Fusion for Remote Sensing Data With Label Uncertainty
    Du, Xiaoxiao
    Zare, Alina
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2020, 58 (04): : 2755 - 2769
  • [24] Hierarchical Feature Fusion of Transformer With Patch Dilating for Remote Sensing Scene Classification
    Chen, Xiaoning
    Ma, Mingyang
    Li, Yong
    Mei, Shaohui
    Han, Zonghao
    Zhao, Jian
    Cheng, Wei
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61 : 1 - 16
  • [25] A Novel Approach to Incomplete Multimodal Learning for Remote Sensing Data Fusion
    Chen, Yuxing
    Zhao, Maofan
    Bruzzone, Lorenzo
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 14
  • [26] Transfer Representation Learning Meets Multimodal Fusion Classification for Remote Sensing Images
    Ma, Mengru
    Ma, Wenping
    Jiao, Licheng
    Liu, Xu
    Liu, Fang
    Li, Lingling
    Yang, Shuyuan
    Hou, Biao
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [27] CSTFNet: A CNN and Dual Swin-Transformer Fusion Network for Remote Sensing Hyperspectral Data Fusion and Classification of Coastal Areas
    Li, Dekai
    Neira-Molina, Harold
    Huang, Mengxing
    Syam, M. S.
    Yu, Zhang
    Zhang, Junfeng
    Bhatti, Uzair Aslam
    Asif, Muhammad
    Sarhan, Nadia
    Awwad, Emad Mahrous
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2025, 18 : 5853 - 5865
  • [28] MULTISOURCE REMOTE SENSING DATA CLASSIFICATION USING FRACTIONAL FOURIER TRANSFORMER
    Zhao, Xudong
    Zhang, Mengmeng
    Tao, Ran
    Li, Wei
    Liao, Wenzhi
    Phlips, Wilfried
    2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022), 2022, : 823 - 826
  • [29] Multisource Remote Sensing Data Classification With Graph Fusion Network
    Du, Xingqian
    Zheng, Xiangtao
    Lu, Xiaoqiang
    Doudkin, Alexander A.
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2021, 59 (12): : 10062 - 10072
  • [30] Foundation Model-Based Multimodal Remote Sensing Data Classification
    He, Xin
    Chen, Yushi
    Huang, Lingbo
    Hong, Danfeng
    Du, Qian
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 17