Modality Fusion Vision Transformer for Hyperspectral and LiDAR Data Collaborative Classification

被引:17
|
作者
Yang, Bin [1 ]
Wang, Xuan [2 ]
Xing, Ying [2 ,3 ]
Cheng, Chen [4 ]
Jiang, Weiwei [5 ,6 ]
Feng, Quanlong [7 ]
机构
[1] China Unicom Res Inst, Graph neural network & artificial intelligence tea, Beijing 100032, Peoples R China
[2] Beijing Univ Posts & Telecommun, Sch Artificial Intelligence, Beijing 100876, Peoples R China
[3] Yunnan Univ, Yunnan Key Lab Software Engn, Kunming 650500, Peoples R China
[4] China Unicom Res Inst, Network Technol Res Ctr, Beijing 100032, Peoples R China
[5] Beijing Univ Posts & Telecommun, Sch Informat & Commun Engn, Beijing 100876, Peoples R China
[6] Anhui Univ, Key Lab Unive Wireless Commun, Minist Educ, Anhui Prov Key Lab Multimodal Cognit Computat, Hefei 230039, Peoples R China
[7] China Agr Univ, Geog Informat Engn, Beijing 100083, Peoples R China
关键词
Feature extraction; Laser radar; Transformers; Hyperspectral imaging; Data mining; Data models; Vectors; Cross-attention (CA); hyperspectral image (HSI); light detection and ranging (LiDAR); modality fusion; vision transformer (ViT); EXTINCTION PROFILES;
D O I
10.1109/JSTARS.2024.3415729
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In recent years, collaborative classification of multimodal data, e.g., hyperspectral image (HSI) and light detection and ranging (LiDAR), has been widely used to improve remote sensing image classification accuracy. However, existing fusion approaches for HSI and LiDAR suffer from limitations. Fusing the heterogeneous features of HSI and LiDAR proved to be challenging, leading to incomplete utilization of information for category representation. In addition, during the extraction of spatial features from HSI, the spectral and spatial information are often disjointed. It leads to the difficulty of fully exploiting the rich spectral information in hyperspectral data. To address these issues, we proposed a multimodal data fusion framework specifically designed for HSI and LiDAR fusion classification, called modality fusion vision transformer. We have designed a stackable modality fusion block as the core of our model. Specifically, these blocks mainly consist of multimodal cross-attention modules and spectral self-attention modules. The proposed novel multimodal cross-attention module for feature fusion addresses the issue of insufficient fusion of heterogeneous features from HSI and LiDAR for category representation. Compared to other cross-attention methods, it reduces the alignment requirements between modal feature spaces during cross-modal fusion. The spectral self-attention module can preserve spatial features while exploiting the rich spectral information and participating in the process of extracting spatial features from HSI. Ultimately, we achieve overall classification accuracies of 99.91%, 99.59%, and 96.98% on three benchmark datasets respectively, surpassing all state-of-the-art methods, demonstrating the stability and effectiveness of our model.
引用
收藏
页码:17052 / 17065
页数:14
相关论文
共 50 条
  • [21] Autoencoder-Based Fusion Classification of Hyperspectral and LiDAR Data
    Wang Yibo
    Dai Song
    Song Dongmei
    Cao Guofa
    Ren Jie
    LASER & OPTOELECTRONICS PROGRESS, 2024, 61 (12)
  • [22] Coupled adversarial learning for fusion classification of hyperspectral and LiDAR data
    Lu, Ting
    Ding, Kexin
    Fu, Wei
    Li, Shutao
    Guo, Anjing
    INFORMATION FUSION, 2023, 93 : 118 - 131
  • [23] Joint Classification of Hyperspectral and LiDAR Data Using a Hierarchical CNN and Transformer
    Zhao, Guangrui
    Ye, Qiaolin
    Sun, Le
    Wu, Zebin
    Pan, Chengsheng
    Jeon, Byeungwoo
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [24] Feature-Decision Level Collaborative Fusion Network for Hyperspectral and LiDAR Classification
    Zhang, Shenfu
    Meng, Xiangchao
    Liu, Qiang
    Yang, Gang
    Sun, Weiwei
    REMOTE SENSING, 2023, 15 (17)
  • [25] Multimodal Transformer Network for Hyperspectral and LiDAR Classification
    Zhang, Yiyan
    Xu, Shufang
    Hong, Danfeng
    Gao, Hongmin
    Zhang, Chenkai
    Bi, Meiqiao
    Li, Chenming
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [26] BIHAF-Net: Bilateral Interactive Hierarchical Adaptive Fusion Network for Collaborative Classification of Hyperspectral and LiDAR Data
    Zhao, Yunji
    Bao, Wenming
    Xu, Jun
    Xu, Xiaozhuo
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2024, 17 : 15971 - 15988
  • [27] Collaborative Contrastive Learning for Hyperspectral and LiDAR Classification
    Jia, Sen
    Zhou, Xi
    Jiang, Shuguo
    He, Ruyan
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [28] Classification of hyperspectral and LIDAR data using extinction profiles with feature fusion
    Zhang, Mengmeng
    Ghamisi, Pedram
    Li, Wei
    REMOTE SENSING LETTERS, 2017, 8 (10) : 957 - 966
  • [29] Fusion of waveform LiDAR data and hyperspectral imagery for land cover classification
    Wang, Hongzhou
    Glennie, Craig
    ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2015, 108 : 1 - 11
  • [30] SEMI-SUPERVISED GRAPH FUSION OF HYPERSPECTRAL AND LIDAR DATA FOR CLASSIFICATION
    Liao, Wenzhi
    Xia, Junshi
    Du, Peijun
    Philips, Wilfried
    2015 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS), 2015, : 53 - 56