UAV Cross-Modal Image Registration: Large-Scale Dataset and Transformer-Based Approach

被引:0
|
作者
Xiao, Yun [1 ]
Liu, Fei [4 ]
Zhu, Yabin [3 ]
Li, Chenglong [1 ,2 ]
Wang, Futian [4 ]
Tang, Jin [2 ,4 ]
机构
[1] Anhui Univ, Sch Artificial Intelligence, Hefei, Peoples R China
[2] Hefei Comprehens Natl Sci Ctr, Inst Artificial Intelligence, Hefei, Peoples R China
[3] Anhui Univ, Sch Elect & Informat Engn, Hefei, Peoples R China
[4] Anhui Univ, Sch Comp Sci & Technol, Hefei, Peoples R China
基金
中国国家自然科学基金;
关键词
Visible-thermal infrared; Cross-modal image registration; UAV dataset; Homography estimation;
D O I
10.1007/978-981-97-1417-9_16
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
It is common to equip unmanned aerial vehicle (UAV) with visible-thermal infrared cameras to enable them to operate around the clock under any weather conditions. However, these two cameras often encounter significant non-registration issues. Multimodal methods depend on registered data, whereas current platforms often lack registration. This absence of registration renders the data unusable for these methods. Thus, there is a pressing need for research on UAV cross-modal image registration. At present, a scarcity of datasets has limited the development of this area. For this reason, we construct a dataset for visible infrared image registration (UAV-VIIR), which consists of 5560 image pairs. The dataset has five additional challenges including low-light, low-texture, foggy weather, motion blur, and thermal crossover. Furthermore, the dataset covers more than a dozen diverse and complex UAV scences. As far as our knowledge extends, this dataset ranks among the largest open-source collections available in this field. Additionally, we propose a transformer-based homography estimation network (THENet), which incorporates a cross-enhanced transformer module and effectively enhances the features of different modalities. Extensive experiments are conducted on our proposed dataset to demonstrate the superiority and effectiveness of our approach compared to state-of-the-art methods.
引用
收藏
页码:166 / 176
页数:11
相关论文
共 50 条
  • [41] MCCN: Multimodal Coordinated Clustering Network for Large-Scale Cross-modal Retrieval
    Zeng, Zhixiong
    Sun, Ying
    Mao, Wenji
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 5427 - 5435
  • [42] Multi-Networks Joint Learning for Large-Scale Cross-Modal Retrieval
    Zhang, Liang
    Ma, Bingpeng
    Li, Guorong
    Huang, Qingming
    Tian, Qi
    PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17), 2017, : 907 - 915
  • [43] An Architecture for Accelerated Large-Scale Inference of Transformer-Based Language Models
    Ganiev, Amir
    Chapin, Colt
    de Andrade, Anderson
    Liu, Chen
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, NAACL-HLT 2021, 2021, : 163 - 169
  • [44] A large cross-modal video retrieval dataset with reading comprehension
    Wu, Weijia
    Zhao, Yuzhong
    Li, Zhuang
    Li, Jiahong
    Zhou, Hong
    Shou, Mike Zheng
    Bai, Xiang
    PATTERN RECOGNITION, 2025, 157
  • [45] Compressing Large-Scale Transformer-Based Models: A Case Study on BERT
    Ganesh, Prakhar
    Chen, Yao
    Lou, Xin
    Khan, Mohammad Ali
    Yang, Yin
    Sajjad, Hassan
    Nakov, Preslav
    Chen, Deming
    Winslett, Marianne
    TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2021, 9 : 1061 - 1080
  • [46] A Regenerated Feature Extraction Method for Cross-modal Image Registration
    Yang, Jian
    Wang, Qi
    Li, Xuelong
    ADVANCES IN BRAIN INSPIRED COGNITIVE SYSTEMS, BICS 2018, 2018, 10989 : 441 - 451
  • [47] A Transformer-Based Network for Deformable Medical Image Registration
    Wang, Yibo
    Qian, Wen
    Li, Mengqi
    Zhang, Xuming
    ARTIFICIAL INTELLIGENCE, CICAI 2022, PT I, 2022, 13604 : 502 - 513
  • [48] Cross-modal transformer with language query for referring image segmentation
    Zhang, Wenjing
    Tan, Quange
    Li, Pengxin
    Zhang, Qi
    Wang, Rong
    NEUROCOMPUTING, 2023, 536 : 191 - 205
  • [49] Symmetric transformer-based network for unsupervised image registration
    Ma, Mingrui
    Xu, Yuanbo
    Song, Lei
    Liu, Guixia
    KNOWLEDGE-BASED SYSTEMS, 2022, 257
  • [50] Joint-modal Distribution-based Similarity Hashing for Large-scale Unsupervised Deep Cross-modal Retrieval
    Liu, Song
    Qian, Shengsheng
    Guan, Yang
    Zhan, Jiawei
    Ying, Long
    PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20), 2020, : 1379 - 1388