Multi-label remote sensing classification with self-supervised gated multi-modal transformers

被引:1
|
作者
Liu, Na [1 ]
Yuan, Ye [1 ]
Wu, Guodong [2 ]
Zhang, Sai [2 ]
Leng, Jie [2 ]
Wan, Lihong [2 ]
机构
[1] Univ Shanghai Sci & Technol, Inst Machine Intelligence, Shanghai, Peoples R China
[2] Origin Dynam Intelligent Robot Co Ltd, Zhengzhou, Peoples R China
关键词
self-supervised learning; pre-training; vision transformer; multi-modal; gated units; BENCHMARK-ARCHIVE; LARGE-SCALE; BIGEARTHNET;
D O I
10.3389/fncom.2024.1404623
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Introduction With the great success of Transformers in the field of machine learning, it is also gradually attracting widespread interest in the field of remote sensing (RS). However, the research in the field of remote sensing has been hampered by the lack of large labeled data sets and the inconsistency of data modes caused by the diversity of RS platforms. With the rise of self-supervised learning (SSL) algorithms in recent years, RS researchers began to pay attention to the application of "pre-training and fine-tuning" paradigm in RS. However, there are few researches on multi-modal data fusion in remote sensing field. Most of them choose to use only one of the modal data or simply splice multiple modal data roughly.Method In order to study a more efficient multi-modal data fusion scheme, we propose a multi-modal fusion mechanism based on gated unit control (MGSViT). In this paper, we pretrain the ViT model based on BigEarthNet dataset by combining two commonly used SSL algorithms, and propose an intra-modal and inter-modal gated fusion unit for feature learning by combining multispectral (MS) and synthetic aperture radar (SAR). Our method can effectively combine different modal data to extract key feature information.Results and discussion After fine-tuning and comparison experiments, we outperform the most advanced algorithms in all downstream classification tasks. The validity of our proposed method is verified.
引用
收藏
页数:15
相关论文
共 50 条
  • [31] Multi-label modality enhanced attention based self-supervised deep cross-modal hashing
    Zou, Xitao
    Wu, Song
    Zhang, Nian
    Bakker, Erwin M.
    KNOWLEDGE-BASED SYSTEMS, 2022, 239
  • [32] Semi-Supervised Multi-Modal Multi-Instance Multi-Label Deep Network with Optimal Transport
    Yang, Yang
    Fu, Zhao-Yang
    Zhan, De-Chuan
    Liu, Zhi-Bin
    Jiang, Yuan
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2021, 33 (02) : 696 - 709
  • [33] Self-Supervised Multi-Label Transformation Prediction for Video Representation Learning
    Assefa, Maregu
    Jiang, Wei
    Yilma, Getinet
    Kumeda, Bulbula
    Ayalew, Melese
    Seid, Mohammed
    JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2022, 31 (09)
  • [34] Self-Supervised Entity Alignment Based on Multi-Modal Contrastive Learning
    Bo Liu
    Ruoyi Song
    Yuejia Xiang
    Junbo Du
    Weijian Ruan
    Jinhui Hu
    IEEE/CAA Journal of Automatica Sinica, 2022, 9 (11) : 2031 - 2033
  • [35] Self-supervised Multi-modal Alignment for Whole Body Medical Imaging
    Windsor, Rhydian
    Jamaludin, Amir
    Kadir, Timor
    Zisserman, Andrew
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2021, PT II, 2021, 12902 : 90 - 101
  • [36] Multi-modal Food Recommendation Using Clustering and Self-supervised Learning
    Zhang, Yixin
    Zhou, Xin
    Meng, Qianwen
    Zhu, Fanglin
    Xu, Yonghui
    Shen, Zhiqi
    Cui, Lizhen
    PRICAI 2024: TRENDS IN ARTIFICIAL INTELLIGENCE, PT I, 2025, 15281 : 269 - 281
  • [37] Self-Supervised Entity Alignment Based on Multi-Modal Contrastive Learning
    Liu, Bo
    Song, Ruoyi
    Xiang, Yuejia
    Du, Junbo
    Ruan, Weijian
    Hu, Jinhui
    IEEE-CAA JOURNAL OF AUTOMATICA SINICA, 2022, 9 (11) : 2031 - 2033
  • [38] General Multi-label Image Classification with Transformers
    Lanchantin, Jack
    Wang, Tianlu
    Ordonez, Vicente
    Qi, Yanjun
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 16473 - 16483
  • [39] Highly Interactive Self-Supervised Learning for Multi-Modal Trajectory Prediction
    Xie, Wenda
    Liu, Yahui
    Zhao, Hongxia
    Guo, Chao
    Dai, Xingyuan
    Lv, Yisheng
    IFAC PAPERSONLINE, 2024, 58 (10): : 231 - 236
  • [40] Self-supervised Label-Visual Correlation Hashing for Multi-label Image Retrieval
    Liu, Yu
    Xie, Yanzhao
    Song, Jingkuan
    Wei, Rukai
    Zhou, Ke
    WEB AND BIG DATA, PT II, APWEB-WAIM 2022, 2023, 13422 : 129 - 143