Multi-label remote sensing classification with self-supervised gated multi-modal transformers

被引:1
|
作者
Liu, Na [1 ]
Yuan, Ye [1 ]
Wu, Guodong [2 ]
Zhang, Sai [2 ]
Leng, Jie [2 ]
Wan, Lihong [2 ]
机构
[1] Univ Shanghai Sci & Technol, Inst Machine Intelligence, Shanghai, Peoples R China
[2] Origin Dynam Intelligent Robot Co Ltd, Zhengzhou, Peoples R China
关键词
self-supervised learning; pre-training; vision transformer; multi-modal; gated units; BENCHMARK-ARCHIVE; LARGE-SCALE; BIGEARTHNET;
D O I
10.3389/fncom.2024.1404623
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Introduction With the great success of Transformers in the field of machine learning, it is also gradually attracting widespread interest in the field of remote sensing (RS). However, the research in the field of remote sensing has been hampered by the lack of large labeled data sets and the inconsistency of data modes caused by the diversity of RS platforms. With the rise of self-supervised learning (SSL) algorithms in recent years, RS researchers began to pay attention to the application of "pre-training and fine-tuning" paradigm in RS. However, there are few researches on multi-modal data fusion in remote sensing field. Most of them choose to use only one of the modal data or simply splice multiple modal data roughly.Method In order to study a more efficient multi-modal data fusion scheme, we propose a multi-modal fusion mechanism based on gated unit control (MGSViT). In this paper, we pretrain the ViT model based on BigEarthNet dataset by combining two commonly used SSL algorithms, and propose an intra-modal and inter-modal gated fusion unit for feature learning by combining multispectral (MS) and synthetic aperture radar (SAR). Our method can effectively combine different modal data to extract key feature information.Results and discussion After fine-tuning and comparison experiments, we outperform the most advanced algorithms in all downstream classification tasks. The validity of our proposed method is verified.
引用
收藏
页数:15
相关论文
共 50 条
  • [11] Self-supervised multi-modal fusion network for multi-modal thyroid ultrasound image diagnosis
    Xiang, Zhuo
    Zhuo, Qiuluan
    Zhao, Cheng
    Deng, Xiaofei
    Zhu, Ting
    Wang, Tianfu
    Jiang, Wei
    Lei, Baiying
    COMPUTERS IN BIOLOGY AND MEDICINE, 2022, 150
  • [12] Integrating remote sensing with OpenStreetMap data for comprehensive scene understanding through multi-modal self-supervised learning
    Bai, Lubin
    Zhang, Xiuyuan
    Wang, Haoyu
    Du, Shihong
    REMOTE SENSING OF ENVIRONMENT, 2025, 318
  • [13] A Deep Multi-Modal CNN for Multi-Instance Multi-Label Image Classification
    Song, Lingyun
    Liu, Jun
    Qian, Buyue
    Sun, Mingxuan
    Yang, Kuan
    Sun, Meng
    Abbas, Samar
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (12) : 6025 - 6038
  • [14] Multi-modal Contextual Prompt Learning for Multi-label Classification with Partial Labels
    Wang, Rui
    Pan, Zhengxin
    Wu, Fangyu
    Lv, Yifan
    Zhang, Bailing
    2024 16TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND COMPUTING, ICMLC 2024, 2024, : 517 - 524
  • [15] Multi-Label Classification of Fundus Images With Graph Convolutional Network and Self-Supervised Learning
    Lin, Jinke
    Cai, Qingling
    Lin, Manying
    IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 454 - 458
  • [16] OBJECT-AWARE SELF-SUPERVISED MULTI-LABEL LEARNING
    Xu Kaixin
    Liu Liyang
    Zhao Ziyuan
    Zeng, Zeng
    Veeravalli, Bharadwaj
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 361 - 365
  • [17] Self-supervised Multi-Modal Video Forgery Attack Detection
    Zhao, Chenhui
    Li, Xiang
    Younes, Rabih
    2023 IEEE WIRELESS COMMUNICATIONS AND NETWORKING CONFERENCE, WCNC, 2023,
  • [18] Self-Supervised Distilled Learning for Multi-modal Misinformation Identification
    Mu, Michael
    Das Bhattacharjee, Sreyasee
    Yuan, Junsong
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 2818 - 2827
  • [19] Collaboration based multi-modal multi-label learning
    Zhang, Yi
    Zhu, Yinlong
    Zhang, Zhecheng
    Wang, Chongjung
    APPLIED INTELLIGENCE, 2022, 52 (12) : 14204 - 14217
  • [20] Collaboration based multi-modal multi-label learning
    Yi Zhang
    Yinlong Zhu
    Zhecheng Zhang
    Chongjung Wang
    Applied Intelligence, 2022, 52 : 14204 - 14217