Multi-label remote sensing classification with self-supervised gated multi-modal transformers

被引：1

作者：

Liu, Na ^{[1
]}

Yuan, Ye ^{[1
]}

Wu, Guodong ^{[2
]}

Zhang, Sai ^{[2
]}

Leng, Jie ^{[2
]}

Wan, Lihong ^{[2
]}

机构：

[1] Univ Shanghai Sci & Technol, Inst Machine Intelligence, Shanghai, Peoples R China

[2] Origin Dynam Intelligent Robot Co Ltd, Zhengzhou, Peoples R China

来源：

FRONTIERS IN COMPUTATIONAL NEUROSCIENCE | 2024年 / 18卷

关键词：

self-supervised learning; pre-training; vision transformer; multi-modal; gated units; BENCHMARK-ARCHIVE; LARGE-SCALE; BIGEARTHNET;

D O I：

10.3389/fncom.2024.1404623

中图分类号：

Q [生物科学];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

Introduction With the great success of Transformers in the field of machine learning, it is also gradually attracting widespread interest in the field of remote sensing (RS). However, the research in the field of remote sensing has been hampered by the lack of large labeled data sets and the inconsistency of data modes caused by the diversity of RS platforms. With the rise of self-supervised learning (SSL) algorithms in recent years, RS researchers began to pay attention to the application of "pre-training and fine-tuning" paradigm in RS. However, there are few researches on multi-modal data fusion in remote sensing field. Most of them choose to use only one of the modal data or simply splice multiple modal data roughly.Method In order to study a more efficient multi-modal data fusion scheme, we propose a multi-modal fusion mechanism based on gated unit control (MGSViT). In this paper, we pretrain the ViT model based on BigEarthNet dataset by combining two commonly used SSL algorithms, and propose an intra-modal and inter-modal gated fusion unit for feature learning by combining multispectral (MS) and synthetic aperture radar (SAR). Our method can effectively combine different modal data to extract key feature information.Results and discussion After fine-tuning and comparison experiments, we outperform the most advanced algorithms in all downstream classification tasks. The validity of our proposed method is verified.

引用

页数：15

共 50 条

[31] Multi-label modality enhanced attention based self-supervised deep cross-modal hashing
Zou, Xitao
Wu, Song
Zhang, Nian
Bakker, Erwin M.
KNOWLEDGE-BASED SYSTEMS, 2022, 239
[32] Semi-Supervised Multi-Modal Multi-Instance Multi-Label Deep Network with Optimal Transport
Yang, Yang
Fu, Zhao-Yang
Zhan, De-Chuan
Liu, Zhi-Bin
Jiang, Yuan
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2021, 33 (02) : 696 - 709
[33] Self-Supervised Multi-Label Transformation Prediction for Video Representation Learning
Assefa, Maregu
Jiang, Wei
Yilma, Getinet
Kumeda, Bulbula
Ayalew, Melese
Seid, Mohammed
JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2022, 31 (09)
[34] Self-Supervised Entity Alignment Based on Multi-Modal Contrastive Learning
Bo Liu
Ruoyi Song
Yuejia Xiang
Junbo Du
Weijian Ruan
Jinhui Hu
IEEE/CAA Journal of Automatica Sinica, 2022, 9 (11) : 2031 - 2033
[35] Self-supervised Multi-modal Alignment for Whole Body Medical Imaging
Windsor, Rhydian
Jamaludin, Amir
Kadir, Timor
Zisserman, Andrew
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2021, PT II, 2021, 12902 : 90 - 101
[36] Multi-modal Food Recommendation Using Clustering and Self-supervised Learning
Zhang, Yixin
Zhou, Xin
Meng, Qianwen
Zhu, Fanglin
Xu, Yonghui
Shen, Zhiqi
Cui, Lizhen
PRICAI 2024: TRENDS IN ARTIFICIAL INTELLIGENCE, PT I, 2025, 15281 : 269 - 281
[37] Self-Supervised Entity Alignment Based on Multi-Modal Contrastive Learning
Liu, Bo
Song, Ruoyi
Xiang, Yuejia
Du, Junbo
Ruan, Weijian
Hu, Jinhui
IEEE-CAA JOURNAL OF AUTOMATICA SINICA, 2022, 9 (11) : 2031 - 2033
[38] General Multi-label Image Classification with Transformers
Lanchantin, Jack
Wang, Tianlu
Ordonez, Vicente
Qi, Yanjun
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 16473 - 16483
[39] Highly Interactive Self-Supervised Learning for Multi-Modal Trajectory Prediction
Xie, Wenda
Liu, Yahui
Zhao, Hongxia
Guo, Chao
Dai, Xingyuan
Lv, Yisheng
IFAC PAPERSONLINE, 2024, 58 (10): : 231 - 236
[40] Self-supervised Label-Visual Correlation Hashing for Multi-label Image Retrieval
Liu, Yu
Xie, Yanzhao
Song, Jingkuan
Wei, Rukai
Zhou, Ke
WEB AND BIG DATA, PT II, APWEB-WAIM 2022, 2023, 13422 : 129 - 143

← 1 2 3 4 5 →