UAV Cross-Modal Image Registration: Large-Scale Dataset and Transformer-Based Approach

被引:0
|
作者
Xiao, Yun [1 ]
Liu, Fei [4 ]
Zhu, Yabin [3 ]
Li, Chenglong [1 ,2 ]
Wang, Futian [4 ]
Tang, Jin [2 ,4 ]
机构
[1] Anhui Univ, Sch Artificial Intelligence, Hefei, Peoples R China
[2] Hefei Comprehens Natl Sci Ctr, Inst Artificial Intelligence, Hefei, Peoples R China
[3] Anhui Univ, Sch Elect & Informat Engn, Hefei, Peoples R China
[4] Anhui Univ, Sch Comp Sci & Technol, Hefei, Peoples R China
基金
中国国家自然科学基金;
关键词
Visible-thermal infrared; Cross-modal image registration; UAV dataset; Homography estimation;
D O I
10.1007/978-981-97-1417-9_16
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
It is common to equip unmanned aerial vehicle (UAV) with visible-thermal infrared cameras to enable them to operate around the clock under any weather conditions. However, these two cameras often encounter significant non-registration issues. Multimodal methods depend on registered data, whereas current platforms often lack registration. This absence of registration renders the data unusable for these methods. Thus, there is a pressing need for research on UAV cross-modal image registration. At present, a scarcity of datasets has limited the development of this area. For this reason, we construct a dataset for visible infrared image registration (UAV-VIIR), which consists of 5560 image pairs. The dataset has five additional challenges including low-light, low-texture, foggy weather, motion blur, and thermal crossover. Furthermore, the dataset covers more than a dozen diverse and complex UAV scences. As far as our knowledge extends, this dataset ranks among the largest open-source collections available in this field. Additionally, we propose a transformer-based homography estimation network (THENet), which incorporates a cross-enhanced transformer module and effectively enhances the features of different modalities. Extensive experiments are conducted on our proposed dataset to demonstrate the superiority and effectiveness of our approach compared to state-of-the-art methods.
引用
收藏
页码:166 / 176
页数:11
相关论文
共 50 条
  • [21] Efficient discrete supervised hashing for large-scale cross-modal retrieval
    Yao, Tao
    Han, Yaru
    Wang, Ruxin
    Kong, Xiangwei
    Yan, Lianshan
    Fu, Haiyan
    Tian, Qi
    NEUROCOMPUTING, 2020, 385 (385) : 358 - 367
  • [22] SCALABLE DISCRIMINATIVE DISCRETE HASHING FOR LARGE-SCALE CROSS-MODAL RETRIEVAL
    Qin, Jianyang
    Fei, Lunke
    Zhu, Jian
    Wen, Jie
    Tian, Chunwei
    Wu, Shuai
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 4330 - 4334
  • [23] Label guided correlation hashing for large-scale cross-modal retrieval
    Guohua Dong
    Xiang Zhang
    Long Lan
    Shiwei Wang
    Zhigang Luo
    Multimedia Tools and Applications, 2019, 78 : 30895 - 30922
  • [24] Label guided correlation hashing for large-scale cross-modal retrieval
    Dong, Guohua
    Zhang, Xiang
    Lan, Long
    Wang, Shiwei
    Luo, Zhigang
    MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (21) : 30895 - 30922
  • [25] Learning Discriminative Binary Codes for Large-scale Cross-modal Retrieval
    Xu, Xing
    Shen, Fumin
    Yang, Yang
    Shen, Heng Tao
    Li, Xuelong
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2017, 26 (05) : 2494 - 2507
  • [26] Multiple Information Embedded Hashing for Large-Scale Cross-Modal Retrieval
    Wang, Yongxin
    Zhan, Yu-Wei
    Chen, Zhen-Duo
    Luo, Xin
    Xu, Xin-Shun
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (06) : 5118 - 5131
  • [27] Hybrid Deep Neural Network-Based Cross-Modal Image and Text Retrieval Method for Large-Scale Data
    Qiang, Baohua
    Chen, Ruidong
    Xie, Yuan
    Zhou, Mingliang
    Pan, Riwei
    Zhao, Tian
    JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2021, 30 (01)
  • [28] Fast Semantic Preserving Hashing for Large-Scale Cross-Modal Retrieval
    Wang, Xingzhi
    Liu, Xin
    Peng, Shujuan
    Cheung, Yiu-ming
    Hu, Zhikai
    Wang, Nannan
    2019 19TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2019), 2019, : 1348 - 1353
  • [29] MMAct: A Large-Scale Dataset for Cross Modal Human Action Understanding
    Kong, Quan
    Wu, Ziming
    Deng, Ziwei
    Klinkigt, Martin
    Tong, Bin
    Murakami, Tomokazu
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 8657 - 8666
  • [30] CLIP-based fusion-modal reconstructing hashing for large-scale unsupervised cross-modal retrieval
    Mingyong, Li
    Yewen, Li
    Mingyuan, Ge
    Longfei, Ma
    INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2023, 12 (01)