Dual-attention-transformer-based semantic reranking for large-scale image localization

被引:0
|
作者
Xiao, Yilin [1 ]
Du, Siliang [1 ]
Chen, Xu [1 ]
Liu, Mingzhong [1 ]
Sun, Mingwei [2 ]
机构
[1] Huawei Technol Co Ltd, Wuhan 430074, Hubei, Peoples R China
[2] Wuhan Univ, Sch Remote Sensing & Informat Engn, Wuhan 430079, Hubei, Peoples R China
关键词
Image localization; Dual-attention-transformer; Semantic reranking; Adaptive triplet loss; VISUAL PLACE RECOGNITION;
D O I
10.1007/s10489-024-05539-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The large-scale image-based localization (IBL) problem involves matching a query image with a database image to determine the geolocation of the query. A major challenge in this problem stems from significant variations between images captured at the same location, including different viewpoints, illumination conditions, and seasonal changes. To address this issue, we recognize the potential advantages of integrating difficult positive samples into the training process. Consequently, we introduce a novel retrieval-based framework meticulously designed to harness the advantages presented by these difficult positive samples. A pivotal component is the proposed dual-attention-transformer-based semantic reranking module, which leverages semantic segmentation to preserve local feature points. This module, powered by the dual-attention-transformer, extracts nuanced global-to-local information via channel self-attention and window self-attention, thereby facilitating sample augmentation and final reranking. Additionally, we introduce the adaptive triplet loss, a dynamic mechanism incorporating weighted difficult positive samples into supervised information, which strengthens the model's robustness. We extensively evaluate our framework on various city-level datasets and demonstrate its superiority over state-of-the-art methods. Furthermore, an exhaustive ablation study systematically validates the effectiveness of each individual component, underscoring their contributions to the proposed methodology.
引用
收藏
页码:6946 / 6958
页数:13
相关论文
共 50 条
  • [41] Attention-based dual context aggregation for image semantic segmentation
    Zhao, Dexin
    Qi, Zhiyang
    Yang, Ruixue
    Wang, Zhaohui
    MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (18) : 28201 - 28216
  • [42] Attention-based dual context aggregation for image semantic segmentation
    Dexin Zhao
    Zhiyang Qi
    Ruixue Yang
    Zhaohui Wang
    Multimedia Tools and Applications, 2021, 80 : 28201 - 28216
  • [43] Deep Multi-Scale Attention Hashing Network for Large-Scale Image Retrieval
    Feng H.
    Wang N.
    Tang J.
    Huanan Ligong Daxue Xuebao/Journal of South China University of Technology (Natural Science), 2022, 50 (04): : 35 - 45
  • [44] UM2Former: U-Shaped Multimixed Transformer Network for Large-Scale Hyperspectral Image Semantic Segmentation
    Xu, Aijun
    Xue, Zhaohui
    Li, Ziyu
    Cheng, Shun
    Su, Hongjun
    Xia, Junshi
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2025, 63
  • [45] SEViT: a large-scale and fine-grained plant disease classification model based on transformer and attention convolution
    Zeng, Qingtian
    Niu, Liangwei
    Wang, Shansong
    Ni, Weijian
    MULTIMEDIA SYSTEMS, 2023, 29 (03) : 1001 - 1010
  • [46] UAV Cross-Modal Image Registration: Large-Scale Dataset and Transformer-Based Approach
    Xiao, Yun
    Liu, Fei
    Zhu, Yabin
    Li, Chenglong
    Wang, Futian
    Tang, Jin
    ADVANCES IN BRAIN INSPIRED COGNITIVE SYSTEMS, BICS 2023, 2024, 14374 : 166 - 176
  • [47] MSAPVT: a multi-scale attention pyramid vision transformer network for large-scale fruit recognition
    Rao, Yao
    Li, Chaofeng
    Xu, Feiran
    Guo, Ya
    JOURNAL OF FOOD MEASUREMENT AND CHARACTERIZATION, 2024, 18 (11) : 9233 - 9251
  • [48] SEViT: a large-scale and fine-grained plant disease classification model based on transformer and attention convolution
    Qingtian Zeng
    Liangwei Niu
    Shansong Wang
    Weijian Ni
    Multimedia Systems, 2023, 29 : 1001 - 1010
  • [49] EgoCart: A Benchmark Dataset for Large-Scale Indoor Image-Based Localization in Retail Stores
    Spera, Emiliano
    Furnari, Antonino
    Battiato, Sebastiano
    Farinella, Giovanni Maria
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (04) : 1253 - 1267
  • [50] Knowing What it is: Semantic-Enhanced Dual Attention Transformer
    Ma, Yiwei
    Ji, Jiayi
    Sun, Xiaoshuai
    Zhou, Yiyi
    Wu, Yongjian
    Huang, Feiyue
    Ji, Rongrong
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 3723 - 3736