Multi-label remote sensing classification with self-supervised gated multi-modal transformers

被引:1
|
作者
Liu, Na [1 ]
Yuan, Ye [1 ]
Wu, Guodong [2 ]
Zhang, Sai [2 ]
Leng, Jie [2 ]
Wan, Lihong [2 ]
机构
[1] Univ Shanghai Sci & Technol, Inst Machine Intelligence, Shanghai, Peoples R China
[2] Origin Dynam Intelligent Robot Co Ltd, Zhengzhou, Peoples R China
关键词
self-supervised learning; pre-training; vision transformer; multi-modal; gated units; BENCHMARK-ARCHIVE; LARGE-SCALE; BIGEARTHNET;
D O I
10.3389/fncom.2024.1404623
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Introduction With the great success of Transformers in the field of machine learning, it is also gradually attracting widespread interest in the field of remote sensing (RS). However, the research in the field of remote sensing has been hampered by the lack of large labeled data sets and the inconsistency of data modes caused by the diversity of RS platforms. With the rise of self-supervised learning (SSL) algorithms in recent years, RS researchers began to pay attention to the application of "pre-training and fine-tuning" paradigm in RS. However, there are few researches on multi-modal data fusion in remote sensing field. Most of them choose to use only one of the modal data or simply splice multiple modal data roughly.Method In order to study a more efficient multi-modal data fusion scheme, we propose a multi-modal fusion mechanism based on gated unit control (MGSViT). In this paper, we pretrain the ViT model based on BigEarthNet dataset by combining two commonly used SSL algorithms, and propose an intra-modal and inter-modal gated fusion unit for feature learning by combining multispectral (MS) and synthetic aperture radar (SAR). Our method can effectively combine different modal data to extract key feature information.Results and discussion After fine-tuning and comparison experiments, we outperform the most advanced algorithms in all downstream classification tasks. The validity of our proposed method is verified.
引用
收藏
页数:15
相关论文
共 50 条
  • [21] Self-supervised opinion summarization with multi-modal knowledge graph
    Lingyun Jin
    Jingqiang Chen
    Journal of Intelligent Information Systems, 2024, 62 : 191 - 208
  • [22] Self-supervised opinion summarization with multi-modal knowledge graph
    Jin, Lingyun
    Chen, Jingqiang
    JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2024, 62 (01) : 191 - 208
  • [23] SELF-SUPERVISED LEARNING OF MULTI-MODAL COOPERATION FOR SAR DESPECKLING
    Gaya, Victor
    Dalsasso, Emanuele
    Denis, Loic
    Tupin, Florence
    Pinel-Puyssegur, Beatrice
    Guerin, Cyrielle
    IGARSS 2024-2024 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, IGARSS 2024, 2024, : 2180 - 2183
  • [24] Multi-modal, Multi-task and Multi-label for Music Genre Classification and Emotion Regression
    Pandeya, Yagya Raj
    You, Jie
    Bhattarai, Bhuwan
    Lee, Joonwhoan
    12TH INTERNATIONAL CONFERENCE ON ICT CONVERGENCE (ICTC 2021): BEYOND THE PANDEMIC ERA WITH ICT CONVERGENCE INNOVATION, 2021, : 1042 - 1045
  • [25] Exploring Self-Supervised Learning for Multi-Modal Remote Sensing Pre-Training via Asymmetric Attention Fusion
    Xu, Guozheng
    Jiang, Xue
    Li, Xiangtai
    Zhang, Ze
    Liu, Xingzhao
    REMOTE SENSING, 2023, 15 (24)
  • [26] MULTI-LABEL CLASSIFICATION WITH SINGLE POSITIVE LABEL FOR REMOTE SENSING IMAGE
    Fujii, Keigo
    Iwasaki, Akira
    IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 5870 - 5873
  • [27] Multi-modal Multi-label Emotion Detection with Modality and Label Dependence
    Dong Zhang
    Ju, Xincheng
    Li, Junhui
    Li, Shoushan
    Zhu, Qiaoming
    Zhou, Guodong
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 3584 - 3593
  • [28] Rethinking Modal-oriented Label Correlations for Multi-modal Multi-label Learning
    Zhang, Yi
    Shen, Jundong
    Zhang, Zhecheng
    Zhang, Lei
    Wang, Chongjun
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [29] Open-Vocabulary Multi-Label Classification via Multi-Modal Knowledge Transfer
    He, Sunan
    Guo, Taian
    Dai, Tao
    Qiao, Ruizhi
    Shu, Xiujun
    Ren, Bo
    Xia, Shu-Tao
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 1, 2023, : 808 - 816
  • [30] Multi-label modality enhanced attention based self-supervised deep cross-modal hashing
    Zou, Xitao
    Wu, Song
    Zhang, Nian
    Bakker, Erwin M.
    Knowledge-Based Systems, 2022, 239