UAV Cross-Modal Image Registration: Large-Scale Dataset and Transformer-Based Approach

被引:0
|
作者
Xiao, Yun [1 ]
Liu, Fei [4 ]
Zhu, Yabin [3 ]
Li, Chenglong [1 ,2 ]
Wang, Futian [4 ]
Tang, Jin [2 ,4 ]
机构
[1] Anhui Univ, Sch Artificial Intelligence, Hefei, Peoples R China
[2] Hefei Comprehens Natl Sci Ctr, Inst Artificial Intelligence, Hefei, Peoples R China
[3] Anhui Univ, Sch Elect & Informat Engn, Hefei, Peoples R China
[4] Anhui Univ, Sch Comp Sci & Technol, Hefei, Peoples R China
基金
中国国家自然科学基金;
关键词
Visible-thermal infrared; Cross-modal image registration; UAV dataset; Homography estimation;
D O I
10.1007/978-981-97-1417-9_16
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
It is common to equip unmanned aerial vehicle (UAV) with visible-thermal infrared cameras to enable them to operate around the clock under any weather conditions. However, these two cameras often encounter significant non-registration issues. Multimodal methods depend on registered data, whereas current platforms often lack registration. This absence of registration renders the data unusable for these methods. Thus, there is a pressing need for research on UAV cross-modal image registration. At present, a scarcity of datasets has limited the development of this area. For this reason, we construct a dataset for visible infrared image registration (UAV-VIIR), which consists of 5560 image pairs. The dataset has five additional challenges including low-light, low-texture, foggy weather, motion blur, and thermal crossover. Furthermore, the dataset covers more than a dozen diverse and complex UAV scences. As far as our knowledge extends, this dataset ranks among the largest open-source collections available in this field. Additionally, we propose a transformer-based homography estimation network (THENet), which incorporates a cross-enhanced transformer module and effectively enhances the features of different modalities. Extensive experiments are conducted on our proposed dataset to demonstrate the superiority and effectiveness of our approach compared to state-of-the-art methods.
引用
收藏
页码:166 / 176
页数:11
相关论文
共 50 条
  • [1] Transformer-Based Cross-Modal Recipe Embeddings with Large Batch Training
    Yang, Jing
    Chen, Junwen
    Yanai, Keiji
    MULTIMEDIA MODELING, MMM 2023, PT II, 2023, 13834 : 471 - 482
  • [2] Semantic-consistent cross-modal hashing for large-scale image retrieval
    Gu, Xuesong
    Dong, Guohua
    Zhang, Xiang
    Lan, Long
    Luo, Zhigang
    NEUROCOMPUTING, 2021, 433 : 181 - 198
  • [3] Cross-Modal Self-Taught Hashing for large-scale image retrieval
    Xie, Liang
    Zhu, Lei
    Pan, Peng
    Lu, Yansheng
    SIGNAL PROCESSING, 2016, 124 : 81 - 92
  • [4] Transformer-Based Cross-Modal Information Fusion Network for Semantic Segmentation
    Duan, Zaipeng
    Huang, Xiao
    Ma, Jie
    NEURAL PROCESSING LETTERS, 2023, 55 (05) : 6361 - 6375
  • [5] Unsupervised Deep Cross-Modal Hashing by Knowledge Distillation for Large-scale Cross-modal Retrieval
    Li, Mingyong
    Wang, Hongya
    PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR '21), 2021, : 183 - 191
  • [6] Transformer-Based Cross-Modal Information Fusion Network for Semantic Segmentation
    Zaipeng Duan
    Xiao Huang
    Jie Ma
    Neural Processing Letters, 2023, 55 : 6361 - 6375
  • [7] Large-Scale Supervised Hashing for Cross-Modal Retreival
    Karbil, Loubna
    Daoudi, Imane
    2017 IEEE/ACS 14TH INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 2017, : 803 - 808
  • [8] CCMB: A Large-scale Chinese Cross-modal Benchmark
    Xie, Chunyu
    Cai, Heng
    Li, Jincheng
    Kong, Fanjing
    Wu, Xiaoyu
    Song, Jianfei
    Morimitsu, Henrique
    Yao, Lin
    Wang, Dexin
    Zhang, Xiangzheng
    Leng, Dawei
    Zhang, Baochang
    Ji, Xiangyang
    Deng, Yafeng
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 4219 - 4227
  • [9] A TRANSFORMER-BASED CROSS-MODAL IMAGE-TEXT RETRIEVAL METHOD USING FEATURE DECOUPLING AND RECONSTRUCTION
    Zhang, Huan
    Sun, Yingzhi
    Liao, Yu
    Xu, SiYuan
    Yang, Rui
    Wang, Shuang
    Hou, Biao
    Jiao, Licheng
    2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022), 2022, : 1796 - 1799
  • [10] TECMH: Transformer-Based Cross-Modal Hashing For Fine-Grained Image-Text Retrieval
    Li, Qiqi
    Ma, Longfei
    Jiang, Zheng
    Li, Mingyong
    Jin, Bo
    CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 75 (02): : 3713 - 3728