Cross-Modal Contrastive Learning With Spatiotemporal Context for Correlation-Aware Multiscale Remote Sensing Image Retrieval

被引:2
|
作者
Zhu, Lilu [1 ]
Wang, Yang [1 ]
Hu, Yanfeng [2 ]
Su, Xiaolu
Fu, Kun [2 ]
机构
[1] Suzhou Aerosp Informat Res Inst, Suzhou 215123, Peoples R China
[2] Chinese Acad Sci, Aerosp Informat Res Inst, Beijing 100094, Peoples R China
关键词
Remote sensing; Optical sensors; Optical imaging; Feature extraction; Semantics; Visualization; Image retrieval; Content-based remote sensing image retrieval (CBRSIR); correlation-aware retrieval; cross-modal contrastive learning; hash index code; hierarchical semantic tree; BENCHMARK; DATASET;
D O I
10.1109/TGRS.2024.3417421
中图分类号
P3 [地球物理学]; P59 [地球化学];
学科分类号
0708 ; 070902 ;
摘要
Optical satellites are the most popular observation platforms for humans viewing Earth. Driven by rapidly developing multisource optical remote sensing technology, content-based remote sensing image retrieval (CBRSIR), which aims to retrieve images of interest using extracted visual features, faces new challenges derived from large data volumes, complex feature information, and various spatiotemporal resolutions. Most previous works delve into optical image representation and transformation to the semantic space of retrieval via supervised or unsupervised learning. These retrieval methods fail to fully leverage geospatial information, especially spatiotemporal features, which can improve the accuracy and efficiency to some extent. In this article, we propose a cross-modal contrastive learning method (CCLS2T) to maximize the mutual information of multisource remote sensing platforms for correlation-aware retrieval. Specifically, we develop an asymmetric dual-encoder architecture with a vision encoder that operates on multiscale visual inputs, and a lightweight text encoder that reconstructs spatiotemporal embeddings and adopts an intermediate contrastive objective on representations from unimodal encoders. Then, we add a hash layer to transform the deep fusion features into compact hash index codes. In addition, CCLS2T exploits the prompt template (R2STFT) for multisource remote sensing retrieval to address the text heterogeneity of metadata files and the hierarchical semantic tree (RSHST) to address the feature sparsification of semantic-aware indexing structures. The experimental results on three optical remote sensing datasets substantiate that the proposed CCLS2T can improve retrieval performance by 11.64% and 9.91% compared with many existing hash learning methods and server-side retrieval engines, respectively, in typical optical remote sensing retrieval scenarios.
引用
收藏
页数:21
相关论文
共 50 条
  • [41] Strong and Weak Prompt Engineering for Remote Sensing Image-Text Cross-Modal Retrieval
    Sun, Tianci
    Zheng, Chengyu
    Li, Xiu
    Nie, Jie
    Gao, Yanli
    Huang, Lei
    Wei, Zhiqiang
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2025, 18 : 6968 - 6980
  • [42] Cross-Modal Prealigned Method With Global and Local Information for Remote Sensing Image and Text Retrieval
    Sun, Zengbao
    Zhao, Ming
    Liu, Gaorui
    Kaup, Andre
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
  • [43] A Deep Semantic Alignment Network for the Cross-Modal Image-Text Retrieval in Remote Sensing
    Cheng, Qimin
    Zhou, Yuzhuo
    Fu, Peng
    Xu, Yuan
    Zhang, Liang
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2021, 14 : 4284 - 4297
  • [44] Remote Sensing Cross-Modal Text-Image Retrieval Based on Global and Local Information
    Yuan, Zhiqiang
    Zhang, Wenkai
    Tian, Changyuan
    Rong, Xuee
    Zhang, Zhengyuan
    Wang, Hongqi
    Fu, Kun
    Sun, Xian
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [45] Contrastive Label Correlation Enhanced Unified Hashing Encoder for Cross-modal Retrieval
    Wu, Hongfa
    Zhang, Lisai
    Chen, Qingcai
    Deng, Yimeng
    Siebert, Joanna
    Han, Yunpeng
    Li, Zhonghua
    Kong, Dejiang
    Cao, Zhao
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 2158 - 2168
  • [46] CROSS-MODAL REMOTE SENSING IMAGE RETRIEVAL VIA INTRA- AND INTER-MODAL FEATURE MATCHING
    Yao, Fanglong
    Liu, Nayu
    Li, Peiguang
    Yin, Dongshuo
    Liu, Chenglong
    Sun, Xian
    2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022), 2022, : 1792 - 1795
  • [47] Reducing Semantic Confusion: Scene-aware Aggregation Network for Remote Sensing Cross-modal Retrieval
    Pan, Jiancheng
    Ma, Qing
    Cong, Bai
    PROCEEDINGS OF THE 2023 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2023, 2023, : 398 - 406
  • [48] Multiscale Context Deep Hashing for Remote Sensing Image Retrieval
    Zhao, Dongjie
    Chen, Yaxiong
    Xiong, Shengwu
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2023, 16 : 7163 - 7172
  • [49] TOWARDS SKETCH-BASED IMAGE RETRIEVAL WITH DEEP CROSS-MODAL CORRELATION LEARNING
    Huang, Fei
    Jin, Cheng
    Zhang, Yuejie
    Zhang, Tao
    2017 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2017, : 907 - 912
  • [50] Cross-Modal feature description for remote sensing image matching
    Li, Liangzhi
    Liu, Ming
    Ma, Lingfei
    Han, Ling
    INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION, 2022, 112