Deep Cross-Modal Retrieval for Remote Sensing Image and Audio

被引:0
|
作者
Guo Mao [1 ,2 ]
Yuan Yuan [1 ]
Lu Xiaoqiang [1 ]
机构
[1] Chinese Acad Sci, Xian Inst Opt & Precis Mech, Ctr Opt IMagery Anal & Learning OPTIMAL, Xian 710119, Shaanxi, Peoples R China
[2] Univ Chinese Acad Sci, Beijing 100049, Peoples R China
关键词
cross-modal retrieval; remote sensing image; spoken audio; convolutional neural network; CONVOLUTIONAL NEURAL-NETWORKS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Remote sensing image retrieval has many important applications in civilian and military fields, such as disaster monitoring and target detecting. However, the existing research on image retrieval, mainly including to two directions, text based and content based, cannot meet the rapid and convenient needs of some special applications and emergency scenes. Based on text, the retrieval is limited by keyboard inputting because of its lower efficiency for some urgent situations and based on content, it needs an example image as reference, which usually does not exist. Yet speech, as a direct, natural and efficient human-machine interactive way, can make up these shortcomings. Hence, a novel cross-modal retrieval method for remote sensing image and spoken audio is proposed in this paper. We first build a large-scale remote sensing image dataset with plenty of manual annotated spoken audio captions for the cross-modal retrieval task. Then a Deep Visual-Audio Network is designed to directly learn the correspondence of image and audio. And this model integrates feature extracting and multi-modal learning into the same network. Experiments on the proposed dataset verify the effectiveness of our approach and prove that it is feasible for speech-to-image retrieval.
引用
收藏
页数:6
相关论文
共 50 条
  • [41] Exploring Uni-Modal Feature Learning on Entities and Relations for Remote Sensing Cross-Modal Text-Image Retrieval
    Zhang, Shun
    Li, Yupeng
    Mei, Shaohui
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [42] MGAN: Attempting a Multimodal Graph Attention Network for Remote Sensing Cross-Modal Text-Image Retrieval
    Wang, Zhiming
    Dong, Zhihua
    Yang, Xiaoyu
    Wang, Zhiguo
    Yin, Guangqiang
    PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND NETWORKS, VOL II, CENET 2023, 2024, 1126 : 261 - 273
  • [43] Cross-Modal Remote Sensing Image-Text Retrieval via Context and Uncertainty-Aware Prompt
    Wang, Yijing
    Tang, Xu
    Ma, Jingjing
    Zhang, Xiangrong
    Liu, Fang
    Jiao, Licheng
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024,
  • [44] Cross-Modal Contrastive Learning With Spatiotemporal Context for Correlation-Aware Multiscale Remote Sensing Image Retrieval
    Zhu, Lilu
    Wang, Yang
    Hu, Yanfeng
    Su, Xiaolu
    Fu, Kun
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
  • [45] A Fusion Encoder with Multi-Task Guidance for Cross-Modal Text-Image Retrieval in Remote Sensing
    Zhang, Xiong
    Li, Weipeng
    Wang, Xu
    Wang, Luyao
    Zheng, Fuzhong
    Wang, Long
    Zhang, Haisu
    REMOTE SENSING, 2023, 15 (18)
  • [46] Cross-Modal Hashing With Feature Semi-Interaction and Semantic Ranking for Remote Sensing Ship Image Retrieval
    Sun, Yuxi
    Ye, Yunming
    Kang, Jian
    Fernandez-Beltran, Ruben
    Ban, Yifang
    Hafner, Sebastian
    Li, Xutao
    Luo, Chuyao
    Plaza, Antonio
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 15
  • [47] Hypersphere-Based Remote Sensing Cross-Modal Text-Image Retrieval via Curriculum Learning
    Zhang, Weihang
    Li, Jihao
    Li, Shuoke
    Chen, Jialiang
    Zhang, Wenkai
    Gao, Xin
    Sun, Xian
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [48] Cross-Modal Retrieval Using Deep Learning
    Malik, Shaily
    Bhardwaj, Nikhil
    Bhardwaj, Rahul
    Kumar, Saurabh
    PROCEEDINGS OF THIRD DOCTORAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE, DOSCI 2022, 2023, 479 : 725 - 734
  • [49] Deep Semantic Mapping for Cross-Modal Retrieval
    Wang, Cheng
    Yang, Haojin
    Meinel, Christoph
    2015 IEEE 27TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2015), 2015, : 234 - 241
  • [50] Deep Relation Embedding for Cross-Modal Retrieval
    Zhang, Yifan
    Zhou, Wengang
    Wang, Min
    Tian, Qi
    Li, Houqiang
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 617 - 627