Dual-attention-transformer-based semantic reranking for large-scale image localization

被引:0
|
作者
Xiao, Yilin [1 ]
Du, Siliang [1 ]
Chen, Xu [1 ]
Liu, Mingzhong [1 ]
Sun, Mingwei [2 ]
机构
[1] Huawei Technol Co Ltd, Wuhan 430074, Hubei, Peoples R China
[2] Wuhan Univ, Sch Remote Sensing & Informat Engn, Wuhan 430079, Hubei, Peoples R China
关键词
Image localization; Dual-attention-transformer; Semantic reranking; Adaptive triplet loss; VISUAL PLACE RECOGNITION;
D O I
10.1007/s10489-024-05539-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The large-scale image-based localization (IBL) problem involves matching a query image with a database image to determine the geolocation of the query. A major challenge in this problem stems from significant variations between images captured at the same location, including different viewpoints, illumination conditions, and seasonal changes. To address this issue, we recognize the potential advantages of integrating difficult positive samples into the training process. Consequently, we introduce a novel retrieval-based framework meticulously designed to harness the advantages presented by these difficult positive samples. A pivotal component is the proposed dual-attention-transformer-based semantic reranking module, which leverages semantic segmentation to preserve local feature points. This module, powered by the dual-attention-transformer, extracts nuanced global-to-local information via channel self-attention and window self-attention, thereby facilitating sample augmentation and final reranking. Additionally, we introduce the adaptive triplet loss, a dynamic mechanism incorporating weighted difficult positive samples into supervised information, which strengthens the model's robustness. We extensively evaluate our framework on various city-level datasets and demonstrate its superiority over state-of-the-art methods. Furthermore, an exhaustive ablation study systematically validates the effectiveness of each individual component, underscoring their contributions to the proposed methodology.
引用
收藏
页码:6946 / 6958
页数:13
相关论文
共 50 条
  • [31] Electrical Thermal Image Semantic Segmentation: Large-Scale Dataset and Baseline
    Wang, Futian
    Guo, Yin
    Li, Chenglong
    Lu, Andong
    Ding, Zhongfeng
    Tang, Jin
    Luo, Bin
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2022, 71
  • [32] LARGE-SCALE AERIAL IMAGE INTERPRETATION USING A REDUNDANT SEMANTIC CLASSIFICATION
    Kluckner, Stefan
    Bischof, Horst
    PCV 2010: PHOTOGRAMMETRIC COMPUTER VISION AND IMAGE ANALYSIS, PT II, 2010, 38 : 66 - 71
  • [33] Semantic Hierarchy Preserving Deep Hashing for Large-Scale Image Retrieval
    Ming Zhang
    Zhe, Xuefei
    Le Ou-Yang
    Chen, Shifeng
    Hong Yan
    PROCEEDINGS OF 17TH INTERNATIONAL CONFERENCE ON MACHINE VISION APPLICATIONS (MVA 2021), 2021,
  • [34] Hierarchical feature aggregation network with semantic attention for counting large-scale crowd
    Meng, Chen
    Kang, Chunmeng
    Lyu, Lei
    INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2022, 37 (11) : 9957 - 9981
  • [35] An effective transformer based on dual attention fusion for underwater image enhancement
    Hu, Xianjie
    Liu, Jing
    Li, Heng
    Liu, Hui
    Xue, Xiaojun
    PEERJ COMPUTER SCIENCE, 2024, 10
  • [36] An effective transformer based on dual attention fusion for underwater image enhancement
    Hu X.
    Liu J.
    Li H.
    Liu H.
    Xue X.
    PeerJ Computer Science, 2024, 10
  • [37] Image Semantic Distance Metric Learning Approach for Large-scale Automatic Image Annotation
    Jin, Cong
    Jin, Shu-Wei
    IOTBD: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INTERNET OF THINGS AND BIG DATA, 2016, : 277 - 283
  • [38] PointNAT: Large-Scale Point Cloud Semantic Segmentation via Neighbor Aggregation With Transformer
    Zeng, Ziyin
    Qiu, Huan
    Zhou, Jian
    Dong, Zhen
    Xiao, Jinsheng
    Li, Bijun
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 18
  • [39] Dual Attention GANs for Semantic Image Synthesis
    Tang, Hao
    Bai, Song
    Sebe, Nicu
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 1994 - 2002
  • [40] Dual conditional GAN based on external attention for semantic image synthesis
    Liu, Gang
    Zhou, Qijun
    Xie, Xiaoxiao
    Yu, Qingchen
    CONNECTION SCIENCE, 2023, 35 (01)