Self-Supervision Interactive Alignment for Remote Sensing Image-Audio Retrieval

被引:6
|
作者
Huang, Jinghao [1 ,2 ]
Chen, Yaxiong [1 ,2 ,3 ,4 ]
Xiong, Shengwu [1 ,2 ,3 ,4 ]
Lu, Xiaoqiang [5 ]
机构
[1] Wuhan Univ Technol, Sanya Sci & Educ Innovat Pk, Sanya 572000, Peoples R China
[2] Wuhan Univ Technol, Sch Comp Sci & Artificial Intelligence, Wuhan 430070, Peoples R China
[3] Shanghai Artificial Intelligence Lab, Shanghai 200232, Peoples R China
[4] Wuhan Univ Technol, Chongqing Res Inst, Chongqing 401122, Peoples R China
[5] Fuzhou Univ, Coll Phys & Informat Engn, Fuzhou 350108, Peoples R China
来源
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING | 2023年 / 61卷
关键词
Remote sensing; Semantics; Visualization; Transformers; Task analysis; Unsupervised learning; Technological innovation; Cross-modal remote sensing (RS) retrieval; interactive alignment (IA); self-supervised learning; similarity preservation; SPARSE;
D O I
10.1109/TGRS.2023.3264006
中图分类号
P3 [地球物理学]; P59 [地球化学];
学科分类号
0708 ; 070902 ;
摘要
Cross-modal remote sensing image-audio (RSIA) retrieval aims to use audio or remote sensing images (RSIs) as queries to retrieve relevant RSIs or corresponding audios. Although many approaches leverage labeled samples to achieve good performance, the performance cost of labeled samples is high, because cross-modal remote sensing (RS) labeled samples usually require huge labor resources. Therefore, unsupervised cross-modal learning is very important in real-world applications. In this article, we propose a novel unsupervised cross-modal RSIA retrieval approach, named self-supervision interactive alignment (SSIA), which can take advantage of large amounts of unlabeled samples to learn the salient information, cross-modal alignment, and the similarity between RSIs and audios. Since self-supervised learning lacks the supervision of label information, we leverage the similarity between the input RSI information and audio information as the supervision information. Besides, to perform cross-modal alignment, a novel interactive alignment (IA) module is designed to explore fine correspondence relation for RSIs and audios. Moreover, we design an audio-guided image de-redundant module to reduce the redundant information of visual information, which can capture salient information of RSIs. Extensive experiments on four widely used RSIA datasets testify that the SSIA performance gains better RSIA retrieval performance than other compared approaches.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Multimodal Fusion Remote Sensing Image-Audio Retrieval
    Yang, Rui
    Wang, Shuang
    Sun, Yingzhi
    Zhang, Huan
    Liao, Yu
    Gu, Yu
    Hou, Biao
    Jiao, Licheng
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2022, 15 : 6220 - 6235
  • [2] Fine Aligned Discriminative Hashing for Remote Sensing Image-Audio Retrieval
    Chen, Yaxiong
    Huang, Jinghao
    Xiong, Shengwu
    Lu, Xiaoqiang
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [3] Cross-Modal Remote Sensing Image-Audio Retrieval With Adaptive Learning for Aligning Correlation
    Huang, Jinghao
    Chen, Yaxiong
    Xiong, Shengwu
    Lu, Xiaoqiang
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
  • [4] Self-supervision assisted multimodal remote sensing image classification with coupled self-looping convolution networks
    Pande, Shivam
    Banerjee, Biplab
    NEURAL NETWORKS, 2023, 164 : 1 - 20
  • [5] InsCLR: Improving Instance Retrieval with Self-Supervision
    Deng, Zelu
    Zhong, Yujie
    Guo, Sheng
    Huang, Weilin
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 516 - 524
  • [6] Pre-Training Audio Representations With Self-Supervision
    Tagliasacchi, Marco
    Gfeller, Beat
    Quitry, Felix de Chaumont
    Roblek, Dominik
    IEEE SIGNAL PROCESSING LETTERS, 2020, 27 : 600 - 604
  • [7] Semantic alignment with self-supervision for class incremental learning
    Fu, Zhiling
    Wang, Zhe
    Xu, Xinlei
    Yang, Mengping
    Chi, Ziqiu
    Ding, Weichao
    KNOWLEDGE-BASED SYSTEMS, 2023, 282
  • [8] PVASS-MDD: Predictive Visual-Audio Alignment Self-Supervision for Multimodal Deepfake Detection
    Yu, Yang
    Liu, Xiaolong
    Ni, Rongrong
    Yang, Siyuan
    Zhao, Yao
    Kot, Alex C.
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (08) : 6926 - 6936
  • [9] Self-Supervision, Remote Sensing and Abstraction: Representation Learning Across 3 Million Locations
    Seneviratne, Sachith
    Nice, Kerry A.
    Wijnands, Jasper S.
    Stevenson, Mark
    Thompson, Jason
    2021 INTERNATIONAL CONFERENCE ON DIGITAL IMAGE COMPUTING: TECHNIQUES AND APPLICATIONS (DICTA 2021), 2021, : 189 - 196
  • [10] Interactive learning and probabilistic retrieval in remote sensing image archives
    Schröder, M
    Rehrauer, H
    Seidel, K
    Datcu, M
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2000, 38 (05): : 2288 - 2298