Deep Symmetric Fusion Transformer for Multimodal Remote Sensing Data Classification

被引:3
|
作者
Chang, Honghao [1 ]
Bi, Haixia [1 ]
Li, Fan [1 ]
Xu, Chen [2 ,3 ]
Chanussot, Jocelyn [4 ]
Hong, Danfeng [5 ,6 ]
机构
[1] Xi An Jiao Tong Univ, Sch Informat & Commun Engn, Xian 710049, Peoples R China
[2] Peng Cheng Lab, Dept Math & Fundamental Res, Shenzhen 518055, Peoples R China
[3] Xi An Jiao Tong Univ, Sch Math & Stat, Xian 710049, Peoples R China
[4] Univ Grenoble Alpes, CNRS, INRIA, Grenoble INP LJK, F-38000 Grenoble, France
[5] Chinese Acad Sci, Aerosp Informat Res Inst, Beijing 100049, Peoples R China
[6] Univ Chinese Acad Sci, Sch Elect Elect & Commun Engn, Beijing 100049, Peoples R China
关键词
Land-cover classification; local-global mixture (LGM); multimodal feature fusion; remote sensing; symmetric fusion transformer (SFT); LAND-COVER CLASSIFICATION; LIDAR DATA;
D O I
10.1109/TGRS.2024.3476975
中图分类号
P3 [地球物理学]; P59 [地球化学];
学科分类号
0708 ; 070902 ;
摘要
In recent years, multimodal remote sensing data classification (MMRSC) has evoked growing attention due to its more comprehensive and accurate delineation of Earth's surface compared to its single-modal counterpart. However, it remains challenging to capture and integrate local and global features from single-modal data. Moreover, how to fully excavate and exploit the interactions between different modalities is still an intricate issue. To this end, we propose a novel dual-branch transformer-based framework named deep symmetric fusion transformer (DSymFuser). Within the framework, each branch contains a stack of local-global mixture (LGM) blocks, to extract hierarchical and discriminative single-modal features. In each LGM block, a local-global feature mixer with learnable weights is specifically devised to adaptively aggregate the local and global features extracted with a convolutional neural network (CNN)-transformer network. Furthermore, we innovatively design a symmetric fusion transformer (SFT) that trails behind each LGM block. The elaborately designed SFT symmetrically facilitates cross-modal correlation excavation, comprehensively exploiting the complementary cues underlying heterogeneous modalities. The hierarchical construction of the LGM and SFT blocks enables feature extraction and fusion in a multilevel manner, further promoting the completeness and descriptiveness of the learned features. We conducted extensive ablation studies and comparative experiments on three benchmark datasets, and the experimental results validated the effectiveness and superiority of the proposed method. The source code of the proposed method will be available publicly at https://github.com/HaixiaBi1982/DSymFuser.
引用
收藏
页数:15
相关论文
共 50 条
  • [31] MCFT: Multimodal Contrastive Fusion Transformer for Classification of Hyperspectral Image and LiDAR Data
    Feng, Yining
    Jin, Jiarui
    Yin, Yin
    Song, Chuanming
    Wang, Xianghai
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
  • [32] The 2017 IEEE Geoscience and Remote Sensing Society Data Fusion Contest: Open Data for Global Multimodal Land Use Classification
    Tuia, Devis
    Moser, Gabriele
    Le Saux, Bertrand
    Bechtel, Benjamin
    See, Linda
    IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE, 2017, 5 (04): : 110 - 114
  • [33] A NOVEL DEEP FEATURE FUSION NETWORK FOR REMOTE SENSING SCENE CLASSIFICATION
    Li, Yangyang
    Wang, Qi
    Liang, Xiaoxu
    Jiao, Licheng
    2019 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2019), 2019, : 5484 - 5487
  • [34] Deep Multimodal Data Fusion
    Zhao, Fei
    Zhang, Chengcui
    Geng, Baocheng
    ACM COMPUTING SURVEYS, 2024, 56 (09)
  • [35] RsMmFormer: Multimodal Transformer Using Multiscale Self-attention for Remote Sensing Image Classification
    Zhang, Bo
    Ming, Zuheng
    Liu, Yaqian
    Feng, Wei
    He, Liang
    Zhao, Kaixing
    ARTIFICIAL INTELLIGENCE, CICAI 2023, PT I, 2024, 14473 : 329 - 339
  • [36] STMSF: Swin Transformer with Multi-Scale Fusion for Remote Sensing Scene Classification
    Duan, Yingtao
    Song, Chao
    Zhang, Yifan
    Cheng, Puyu
    Mei, Shaohui
    REMOTE SENSING, 2025, 17 (04)
  • [37] Multimodal Remote Sensing Data Fusion via Coherent Point Set Analysis
    Zou, Huanxin
    Sun, Hao
    Ji, Kefeng
    Du, Chun
    Lu, Chunyan
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2013, 10 (04) : 672 - 676
  • [38] Progressive Symmetric Registration for Multimodal Remote Sensing Imagery
    Yan, Heng
    Ma, Ailong
    Zhong, Yanfei
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2025, 63
  • [39] Fuzzy aggregation for multimodal remote sensing classification
    Nock, Kristen
    Gilmour, Elizabeth
    2020 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE), 2020,
  • [40] Mutually Beneficial Transformer for Multimodal Data Fusion
    Wang, Jinping
    Tan, Xiaojun
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (12) : 7466 - 7479