Modality Fusion Vision Transformer for Hyperspectral and LiDAR Data Collaborative Classification

被引:17
|
作者
Yang, Bin [1 ]
Wang, Xuan [2 ]
Xing, Ying [2 ,3 ]
Cheng, Chen [4 ]
Jiang, Weiwei [5 ,6 ]
Feng, Quanlong [7 ]
机构
[1] China Unicom Res Inst, Graph neural network & artificial intelligence tea, Beijing 100032, Peoples R China
[2] Beijing Univ Posts & Telecommun, Sch Artificial Intelligence, Beijing 100876, Peoples R China
[3] Yunnan Univ, Yunnan Key Lab Software Engn, Kunming 650500, Peoples R China
[4] China Unicom Res Inst, Network Technol Res Ctr, Beijing 100032, Peoples R China
[5] Beijing Univ Posts & Telecommun, Sch Informat & Commun Engn, Beijing 100876, Peoples R China
[6] Anhui Univ, Key Lab Unive Wireless Commun, Minist Educ, Anhui Prov Key Lab Multimodal Cognit Computat, Hefei 230039, Peoples R China
[7] China Agr Univ, Geog Informat Engn, Beijing 100083, Peoples R China
关键词
Feature extraction; Laser radar; Transformers; Hyperspectral imaging; Data mining; Data models; Vectors; Cross-attention (CA); hyperspectral image (HSI); light detection and ranging (LiDAR); modality fusion; vision transformer (ViT); EXTINCTION PROFILES;
D O I
10.1109/JSTARS.2024.3415729
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In recent years, collaborative classification of multimodal data, e.g., hyperspectral image (HSI) and light detection and ranging (LiDAR), has been widely used to improve remote sensing image classification accuracy. However, existing fusion approaches for HSI and LiDAR suffer from limitations. Fusing the heterogeneous features of HSI and LiDAR proved to be challenging, leading to incomplete utilization of information for category representation. In addition, during the extraction of spatial features from HSI, the spectral and spatial information are often disjointed. It leads to the difficulty of fully exploiting the rich spectral information in hyperspectral data. To address these issues, we proposed a multimodal data fusion framework specifically designed for HSI and LiDAR fusion classification, called modality fusion vision transformer. We have designed a stackable modality fusion block as the core of our model. Specifically, these blocks mainly consist of multimodal cross-attention modules and spectral self-attention modules. The proposed novel multimodal cross-attention module for feature fusion addresses the issue of insufficient fusion of heterogeneous features from HSI and LiDAR for category representation. Compared to other cross-attention methods, it reduces the alignment requirements between modal feature spaces during cross-modal fusion. The spectral self-attention module can preserve spatial features while exploiting the rich spectral information and participating in the process of extracting spatial features from HSI. Ultimately, we achieve overall classification accuracies of 99.91%, 99.59%, and 96.98% on three benchmark datasets respectively, surpassing all state-of-the-art methods, demonstrating the stability and effectiveness of our model.
引用
收藏
页码:17052 / 17065
页数:14
相关论文
共 50 条
  • [31] A Triplet Semisupervised Deep Network for Fusion Classification of Hyperspectral and LiDAR Data
    Li, Jiaojiao
    Ma, Yinle
    Song, Rui
    Xi, Bobo
    Hong, Danfeng
    Du, Qian
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [32] AUTOMATIC FUSION AND CLASSIFICATION OF HYPERSPECTRAL AND LIDAR DATA USING RANDOM FORESTS
    Merentitis, Andreas
    Debes, Christian
    Heremans, Roel
    Frangiadakis, Nikolaos
    2014 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS), 2014, : 1245 - 1248
  • [33] CLASSIFICATION OF CLOUDY HYPERSPECTRAL IMAGE AND LIDAR DATA BASED ON FEATURE FUSION AND DECISION FUSION
    Luo, Renbo
    Liao, Wenzhi
    Zhang, Hongyan
    Pi, Youguo
    Philips, Wilfried
    2016 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS), 2016, : 2518 - 2521
  • [34] Joint classification of hyperspectral and LiDAR data based on inter-modality match learning
    Hang R.
    Sun Y.
    Liu Q.
    National Remote Sensing Bulletin, 2024, 28 (01) : 154 - 167
  • [35] Learning transferable cross-modality representations for few-shot hyperspectral and LiDAR collaborative classification
    Dai, Mofan
    Xing, Shuai
    Xu, Qing
    Wang, Hanyun
    Li, Pengcheng
    Sun, Yifan
    Pan, Jiechen
    Li, Yuqiong
    INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION, 2024, 126
  • [36] PROBABILITY FUSION FOR HYPERSPECTRAL AND LIDAR DATA
    Ge, Chiru
    Du, Qian
    IGARSS 2020 - 2020 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2020, : 2675 - 2678
  • [37] Hyperspectral and LiDAR Classification With Semisupervised Graph Fusion
    Xia, Junshi
    Liao, Wenzhi
    Du, Peijun
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2020, 17 (04) : 666 - 670
  • [38] HYPERSPECTRAL AND LIDAR DATA INTEGRATION AND CLASSIFICATION
    Angeles Garcia-Sopo, Maria
    Cuartero, Aurora
    Garcia Rodriguez, Pablo
    Plaza, Antonio
    2015 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS), 2015, : 57 - 60
  • [39] Joint Classification of Hyperspectral and LiDAR Data Using Binary-Tree Transformer Network
    Song, Huacui
    Yang, Yuanwei
    Gao, Xianjun
    Zhang, Maqun
    Li, Shaohua
    Liu, Bo
    Wang, Yanjun
    Kou, Yuan
    REMOTE SENSING, 2023, 15 (11)
  • [40] Joint Classification of Hyperspectral Images and LiDAR Data Based on Dual-Branch Transformer
    Wang, Qingyan
    Zhou, Binbin
    Zhang, Junping
    Xie, Jinbao
    Wang, Yujing
    SENSORS, 2024, 24 (03)