UAV Cross-Modal Image Registration: Large-Scale Dataset and Transformer-Based Approach

被引:0
|
作者
Xiao, Yun [1 ]
Liu, Fei [4 ]
Zhu, Yabin [3 ]
Li, Chenglong [1 ,2 ]
Wang, Futian [4 ]
Tang, Jin [2 ,4 ]
机构
[1] Anhui Univ, Sch Artificial Intelligence, Hefei, Peoples R China
[2] Hefei Comprehens Natl Sci Ctr, Inst Artificial Intelligence, Hefei, Peoples R China
[3] Anhui Univ, Sch Elect & Informat Engn, Hefei, Peoples R China
[4] Anhui Univ, Sch Comp Sci & Technol, Hefei, Peoples R China
基金
中国国家自然科学基金;
关键词
Visible-thermal infrared; Cross-modal image registration; UAV dataset; Homography estimation;
D O I
10.1007/978-981-97-1417-9_16
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
It is common to equip unmanned aerial vehicle (UAV) with visible-thermal infrared cameras to enable them to operate around the clock under any weather conditions. However, these two cameras often encounter significant non-registration issues. Multimodal methods depend on registered data, whereas current platforms often lack registration. This absence of registration renders the data unusable for these methods. Thus, there is a pressing need for research on UAV cross-modal image registration. At present, a scarcity of datasets has limited the development of this area. For this reason, we construct a dataset for visible infrared image registration (UAV-VIIR), which consists of 5560 image pairs. The dataset has five additional challenges including low-light, low-texture, foggy weather, motion blur, and thermal crossover. Furthermore, the dataset covers more than a dozen diverse and complex UAV scences. As far as our knowledge extends, this dataset ranks among the largest open-source collections available in this field. Additionally, we propose a transformer-based homography estimation network (THENet), which incorporates a cross-enhanced transformer module and effectively enhances the features of different modalities. Extensive experiments are conducted on our proposed dataset to demonstrate the superiority and effectiveness of our approach compared to state-of-the-art methods.
引用
收藏
页码:166 / 176
页数:11
相关论文
共 50 条
  • [31] CLIP-based fusion-modal reconstructing hashing for large-scale unsupervised cross-modal retrieval
    Li Mingyong
    Li Yewen
    Ge Mingyuan
    Ma Longfei
    International Journal of Multimedia Information Retrieval, 2023, 12
  • [32] Cross-modal retrieval of large-scale images in social media based on spatial distribution entropy
    Ding J.
    Zhao G.
    Xu F.
    International Journal of Web Based Communities, 2024, 20 (1-2) : 88 - 101
  • [33] Cross-Modal Transformer-Based Streaming Dense Video Captioning with Neural ODE Temporal Localization
    Muksimova, Shakhnoza
    Umirzakova, Sabina
    Sultanov, Murodjon
    Cho, Young Im
    SENSORS, 2025, 25 (03)
  • [34] Transformer-Based Cross-Modal Integration Network for RGB-T Salient Object Detection
    Lv, Chengtao
    Zhou, Xiaofei
    Wan, Bin
    Wang, Shuai
    Sun, Yaoqi
    Zhang, Jiyong
    Yan, Chenggang
    IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2024, 70 (02) : 4741 - 4755
  • [35] Weakly-Supervised Deep Image Hashing based on Cross-Modal Transformer
    Yang, Ching-Ching
    Chu, Wei-Ta
    Dubey, Shiv Ram
    2023 18TH INTERNATIONAL CONFERENCE ON MACHINE VISION AND APPLICATIONS, MVA, 2023,
  • [36] Distillation-Based Hashing Transformer for Cross-Modal Vessel Image Retrieval
    Guo, Jiaen
    Guan, Xin
    Liu, Ying
    Lu, Yu
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2023, 20
  • [37] FDDH: Fast Discriminative Discrete Hashing for Large-Scale Cross-Modal Retrieval
    Liu, Xin
    Wang, Xingzhi
    Yiu-ming Cheung
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (11) : 6306 - 6320
  • [38] Joint Specifics and Consistency Hash Learning for Large-Scale Cross-Modal Retrieval
    Qin, Jianyang
    Fei, Lunke
    Zhang, Zheng
    Wen, Jie
    Xu, Yong
    Zhang, David
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 5343 - 5358
  • [39] Cross-Modal 360° Depth Completion and Reconstruction for Large-Scale Indoor Environment
    Liu, Ruyu
    Zhang, Guodao
    Wang, Jiangming
    Zhao, Shuwen
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (12) : 25180 - 25190
  • [40] Joint and individual matrix factorization hashing for large-scale cross-modal retrieval
    Wang, Di
    Wang, Quan
    He, Lihuo
    Gao, Xinbo
    Tian, Yumin
    PATTERN RECOGNITION, 2020, 107