Modality Fusion Vision Transformer for Hyperspectral and LiDAR Data Collaborative Classification

被引:17
|
作者
Yang, Bin [1 ]
Wang, Xuan [2 ]
Xing, Ying [2 ,3 ]
Cheng, Chen [4 ]
Jiang, Weiwei [5 ,6 ]
Feng, Quanlong [7 ]
机构
[1] China Unicom Res Inst, Graph neural network & artificial intelligence tea, Beijing 100032, Peoples R China
[2] Beijing Univ Posts & Telecommun, Sch Artificial Intelligence, Beijing 100876, Peoples R China
[3] Yunnan Univ, Yunnan Key Lab Software Engn, Kunming 650500, Peoples R China
[4] China Unicom Res Inst, Network Technol Res Ctr, Beijing 100032, Peoples R China
[5] Beijing Univ Posts & Telecommun, Sch Informat & Commun Engn, Beijing 100876, Peoples R China
[6] Anhui Univ, Key Lab Unive Wireless Commun, Minist Educ, Anhui Prov Key Lab Multimodal Cognit Computat, Hefei 230039, Peoples R China
[7] China Agr Univ, Geog Informat Engn, Beijing 100083, Peoples R China
关键词
Feature extraction; Laser radar; Transformers; Hyperspectral imaging; Data mining; Data models; Vectors; Cross-attention (CA); hyperspectral image (HSI); light detection and ranging (LiDAR); modality fusion; vision transformer (ViT); EXTINCTION PROFILES;
D O I
10.1109/JSTARS.2024.3415729
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In recent years, collaborative classification of multimodal data, e.g., hyperspectral image (HSI) and light detection and ranging (LiDAR), has been widely used to improve remote sensing image classification accuracy. However, existing fusion approaches for HSI and LiDAR suffer from limitations. Fusing the heterogeneous features of HSI and LiDAR proved to be challenging, leading to incomplete utilization of information for category representation. In addition, during the extraction of spatial features from HSI, the spectral and spatial information are often disjointed. It leads to the difficulty of fully exploiting the rich spectral information in hyperspectral data. To address these issues, we proposed a multimodal data fusion framework specifically designed for HSI and LiDAR fusion classification, called modality fusion vision transformer. We have designed a stackable modality fusion block as the core of our model. Specifically, these blocks mainly consist of multimodal cross-attention modules and spectral self-attention modules. The proposed novel multimodal cross-attention module for feature fusion addresses the issue of insufficient fusion of heterogeneous features from HSI and LiDAR for category representation. Compared to other cross-attention methods, it reduces the alignment requirements between modal feature spaces during cross-modal fusion. The spectral self-attention module can preserve spatial features while exploiting the rich spectral information and participating in the process of extracting spatial features from HSI. Ultimately, we achieve overall classification accuracies of 99.91%, 99.59%, and 96.98% on three benchmark datasets respectively, surpassing all state-of-the-art methods, demonstrating the stability and effectiveness of our model.
引用
收藏
页码:17052 / 17065
页数:14
相关论文
共 50 条
  • [1] Deep Hierarchical Vision Transformer for Hyperspectral and LiDAR Data Classification
    Xue, Zhixiang
    Tan, Xiong
    Yu, Xuchu
    Liu, Bing
    Yu, Anzhu
    Zhang, Pengqiang
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 3095 - 3110
  • [2] Collaborative classification of hyperspectral and LiDAR data based on CNN-transformer
    Wu H.
    Dai S.
    Wang A.
    Yuji I.
    Yu X.
    Guangxue Jingmi Gongcheng/Optics and Precision Engineering, 2024, 32 (07): : 1087 - 1100
  • [3] Interactive transformer and CNN network for fusion classification of hyperspectral and LiDAR data
    Wang, Leiquan
    Liu, Wenwen
    Lyu, Dong
    Zhang, Peiying
    Guo, Fangming
    Hu, Yabin
    Xu, Mingming
    INTERNATIONAL JOURNAL OF REMOTE SENSING, 2024,
  • [4] COLLABORATIVE CLASSIFICATION OF HYPERSPECTRAL AND LIDAR DATA WITH INFORMATION FUSION AND DEEP NETS
    Chen, Chen
    Zhao, Xudong
    Li, Wei
    Tao, Ran
    Du, Qian
    2019 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2019), 2019, : 2475 - 2478
  • [5] MCFT: Multimodal Contrastive Fusion Transformer for Classification of Hyperspectral Image and LiDAR Data
    Feng, Yining
    Jin, Jiarui
    Yin, Yin
    Song, Chuanming
    Wang, Xianghai
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
  • [6] Multiple Information Collaborative Fusion Network for Joint Classification of Hyperspectral and LiDAR Data
    Tang, Xu
    Zou, Yizhou
    Ma, Jingjing
    Zhang, Xiangrong
    Liu, Fang
    Jiao, Licheng
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
  • [7] Hyperspectral and LiDAR Data Fusion Using Collaborative Representation
    Du, Qian
    Ball, John E.
    Ge, Chiru
    ALGORITHMS, TECHNOLOGIES, AND APPLICATIONS FOR MULTISPECTRAL AND HYPERSPECTRAL IMAGERY XXVI, 2020, 11392
  • [8] Hyperspectral and LiDAR Data Classification Using Kernel Collaborative Representation Based Residual Fusion
    Ge, Chiru
    Du, Qian
    Li, Wei
    Li, Yunsong
    Sun, Weiwei
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2019, 12 (06) : 1963 - 1973
  • [9] Ternary Modality Contrastive Learning for Hyperspectral and LiDAR Data Classification
    Xia, Shuxiang
    Zhang, Xiaohua
    Meng, Hongyun
    Jiao, Licheng
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
  • [10] Multiscale Attention Feature Fusion Based on Improved Transformer for Hyperspectral Image and LiDAR Data Classification
    Wang, Aili
    Lei, Guilong
    Dai, Shiyu
    Wu, Haibin
    Iwahori, Yuji
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2025, 18 : 4124 - 4140