Dual-attention-transformer-based semantic reranking for large-scale image localization

被引:0
|
作者
Xiao, Yilin [1 ]
Du, Siliang [1 ]
Chen, Xu [1 ]
Liu, Mingzhong [1 ]
Sun, Mingwei [2 ]
机构
[1] Huawei Technol Co Ltd, Wuhan 430074, Hubei, Peoples R China
[2] Wuhan Univ, Sch Remote Sensing & Informat Engn, Wuhan 430079, Hubei, Peoples R China
关键词
Image localization; Dual-attention-transformer; Semantic reranking; Adaptive triplet loss; VISUAL PLACE RECOGNITION;
D O I
10.1007/s10489-024-05539-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The large-scale image-based localization (IBL) problem involves matching a query image with a database image to determine the geolocation of the query. A major challenge in this problem stems from significant variations between images captured at the same location, including different viewpoints, illumination conditions, and seasonal changes. To address this issue, we recognize the potential advantages of integrating difficult positive samples into the training process. Consequently, we introduce a novel retrieval-based framework meticulously designed to harness the advantages presented by these difficult positive samples. A pivotal component is the proposed dual-attention-transformer-based semantic reranking module, which leverages semantic segmentation to preserve local feature points. This module, powered by the dual-attention-transformer, extracts nuanced global-to-local information via channel self-attention and window self-attention, thereby facilitating sample augmentation and final reranking. Additionally, we introduce the adaptive triplet loss, a dynamic mechanism incorporating weighted difficult positive samples into supervised information, which strengthens the model's robustness. We extensively evaluate our framework on various city-level datasets and demonstrate its superiority over state-of-the-art methods. Furthermore, an exhaustive ablation study systematically validates the effectiveness of each individual component, underscoring their contributions to the proposed methodology.
引用
收藏
页码:6946 / 6958
页数:13
相关论文
共 50 条
  • [21] Radial Transformer for Large-Scale Outdoor LiDAR Point Cloud Semantic Segmentation
    He, Xiang
    Li, Xu
    Ni, Peizhou
    Xu, Wang
    Xu, Qimin
    Liu, Xixiang
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
  • [22] A SEMANTIC GRAPH-BASED ALGORITHM FOR IMAGE SEARCH RERANKING
    Zhao, Nan
    Dong, Yuan
    Bai, Hongliang
    Wang, Lezi
    Huang, Chong
    Cen, Shusheng
    Zhao, Jian
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 1666 - 1670
  • [23] Assessing Face Image Quality: A Large-Scale Database and a Transformer Method
    Liu, Tie
    Li, Shengxi
    Xu, Mai
    Yang, Li
    Wang, Xiaofei
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (05) : 3981 - 4000
  • [24] Cascaded transformer-based networks for wikipedia large-scale image-caption matching
    Messina, Nicola
    Coccomini, Davide Alessandro
    Esuli, Andrea
    Falchi, Fabrizio
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (23) : 62915 - 62935
  • [25] Swin Transformer-Based Multiscale Attention Model for Landslide Extraction From Large-Scale Area
    Gao, Mengjie
    Chen, Fang
    Wang, Lei
    Zhao, Huichen
    Yu, Bo
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
  • [26] Dual cross-attention Transformer network for few-shot image semantic segmentation
    Liu, Yu
    Guo, Yingchun
    Zhu, Ye
    Yu, Ming
    CHINESE JOURNAL OF LIQUID CRYSTALS AND DISPLAYS, 2024, 39 (11) : 1494 - 1505
  • [27] Fusing Multi-scale Attention and Transformer for Detection and Localization of Image Splicing Forgery
    Xu, Yanzhi
    Zheng, Jiangbin
    Shao, Chenyu
    ADVANCES IN BRAIN INSPIRED COGNITIVE SYSTEMS, BICS 2023, 2024, 14374 : 335 - 344
  • [28] ObjectBook Construction for Large-Scale Semantic-Aware Image Retrieval
    Zhang, Shiliang
    Tian, Qi
    Huang, Qingming
    Gao, Wen
    2011 IEEE 13TH INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP), 2011,
  • [29] Robust Discrete Spectral Hashing for Large-Scale Image Semantic Indexing
    Yang, Yang
    Shen, Fumin
    Shen, Heng Tao
    Li, Hanxi
    Li, Xuelong
    IEEE Transactions on Big Data, 2015, 1 (04): : 162 - 171
  • [30] Applying latent semantic analysis to large-scale medical image databases
    Stathopoulos, Spyridon
    Kalamboukis, Theodore
    COMPUTERIZED MEDICAL IMAGING AND GRAPHICS, 2015, 39 : 27 - 34