Cross-Modal Contrastive Learning With Spatiotemporal Context for Correlation-Aware Multiscale Remote Sensing Image Retrieval

被引:2
|
作者
Zhu, Lilu [1 ]
Wang, Yang [1 ]
Hu, Yanfeng [2 ]
Su, Xiaolu
Fu, Kun [2 ]
机构
[1] Suzhou Aerosp Informat Res Inst, Suzhou 215123, Peoples R China
[2] Chinese Acad Sci, Aerosp Informat Res Inst, Beijing 100094, Peoples R China
关键词
Remote sensing; Optical sensors; Optical imaging; Feature extraction; Semantics; Visualization; Image retrieval; Content-based remote sensing image retrieval (CBRSIR); correlation-aware retrieval; cross-modal contrastive learning; hash index code; hierarchical semantic tree; BENCHMARK; DATASET;
D O I
10.1109/TGRS.2024.3417421
中图分类号
P3 [地球物理学]; P59 [地球化学];
学科分类号
0708 ; 070902 ;
摘要
Optical satellites are the most popular observation platforms for humans viewing Earth. Driven by rapidly developing multisource optical remote sensing technology, content-based remote sensing image retrieval (CBRSIR), which aims to retrieve images of interest using extracted visual features, faces new challenges derived from large data volumes, complex feature information, and various spatiotemporal resolutions. Most previous works delve into optical image representation and transformation to the semantic space of retrieval via supervised or unsupervised learning. These retrieval methods fail to fully leverage geospatial information, especially spatiotemporal features, which can improve the accuracy and efficiency to some extent. In this article, we propose a cross-modal contrastive learning method (CCLS2T) to maximize the mutual information of multisource remote sensing platforms for correlation-aware retrieval. Specifically, we develop an asymmetric dual-encoder architecture with a vision encoder that operates on multiscale visual inputs, and a lightweight text encoder that reconstructs spatiotemporal embeddings and adopts an intermediate contrastive objective on representations from unimodal encoders. Then, we add a hash layer to transform the deep fusion features into compact hash index codes. In addition, CCLS2T exploits the prompt template (R2STFT) for multisource remote sensing retrieval to address the text heterogeneity of metadata files and the hierarchical semantic tree (RSHST) to address the feature sparsification of semantic-aware indexing structures. The experimental results on three optical remote sensing datasets substantiate that the proposed CCLS2T can improve retrieval performance by 11.64% and 9.91% compared with many existing hash learning methods and server-side retrieval engines, respectively, in typical optical remote sensing retrieval scenarios.
引用
收藏
页数:21
相关论文
共 50 条
  • [21] Deep Multiscale Fine-Grained Hashing for Remote Sensing Cross-Modal Retrieval
    Huang, Jiaxiang
    Feng, Yong
    Zhou, Mingliang
    Xiong, Xiancai
    Wang, Yongheng
    Qiang, Baohua
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21 : 1 - 5
  • [22] A TEXTURE AND SALIENCY ENHANCED IMAGE LEARNING METHOD FOR CROSS-MODAL REMOTE SENSING IMAGE-TEXT RETRIEVAL
    Yang, Rui
    Zhang, Di
    Guo, YanHe
    Wang, Shuang
    IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 4895 - 4898
  • [23] Remote Sensing Cross-Modal Retrieval by Deep Image-Voice Hashing
    Zhang, Yichao
    Zheng, Xiangtao
    Lu, Xiaoqiang
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2022, 15 : 9327 - 9338
  • [24] Deep Cross-Modal ImageVoice Retrieval in Remote Sensing
    Chen, Yaxiong
    Lu, Xiaoqiang
    Wang, Shuai
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2020, 58 (10): : 7049 - 7061
  • [25] Structure-aware contrastive hashing for unsupervised cross-modal retrieval
    Cui, Jinrong
    He, Zhipeng
    Huang, Qiong
    Fu, Yulu
    Li, Yuting
    Wen, Jie
    NEURAL NETWORKS, 2024, 174
  • [26] Cross-Modal Compositional Learning for Multilabel Remote Sensing Image Classification
    Guo, Jie
    Jiao, Shuchang
    Sun, Hao
    Song, Bin
    Chi, Yuhao
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2025, 18 : 5810 - 5823
  • [27] Momentum Cross-Modal Contrastive Learning for Video Moment Retrieval
    Han, De
    Cheng, Xing
    Guo, Nan
    Ye, Xiaochun
    Rainer, Benjamin
    Priller, Peter
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (07) : 5977 - 5994
  • [28] A NOVEL SELF-SUPERVISED CROSS-MODAL IMAGE RETRIEVAL METHOD IN REMOTE SENSING
    Sumbul, Gencer
    Mueller, Markus
    Demir, Beguem
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 2426 - 2430
  • [29] Improving text-image cross-modal retrieval with contrastive loss
    Zhang, Chumeng
    Yang, Yue
    Guo, Junbo
    Jin, Guoqing
    Song, Dan
    Liu, An An
    MULTIMEDIA SYSTEMS, 2023, 29 (02) : 569 - 575
  • [30] Image-Text Cross-Modal Retrieval with Instance Contrastive Embedding
    Zeng, Ruigeng
    Ma, Wentao
    Wu, Xiaoqian
    Liu, Wei
    Liu, Jie
    ELECTRONICS, 2024, 13 (02)