R2SCAT-LPR: Rotation-Robust Network with Self- and Cross-Attention Transformers for LiDAR-Based Place Recognition

被引：0

作者：

Jiang, Weizhong ^{[1
]}

Xue, Hanzhang ^{[2
]}

Si, Shubin ^{[1
,3
]}

Xiao, Liang ^{[1
]}

Zhao, Dawei ^{[1
]}

Zhu, Qi ^{[1
]}

Nie, Yiming ^{[1
]}

Dai, Bin ^{[1
]}

机构：

[1] Def Innovat Inst, Unmanned Syst Technol Res Ctr, Beijing 100071, Peoples R China

[2] Natl Univ Def Technol, Test Ctr, Xian 710106, Peoples R China

[3] Harbin Engn Univ, Coll Intelligent Syst Sci & Engn, Harbin 150001, Peoples R China

来源：

REMOTE SENSING | 2025年 / 17卷 / 06期

关键词：

LiDAR-based place recognition; self-attention; cross-attention; multi-level patch features; global context; rotation-robust; SCAN CONTEXT; DESCRIPTOR;

D O I：

10.3390/rs17061057

中图分类号：

X [环境科学、安全科学];

学科分类号：

08 ; 0830 ;

摘要：

LiDAR-based place recognition (LPR) is crucial for the navigation and localization of autonomous vehicles and mobile robots in large-scale outdoor environments and plays a critical role in loop closure detection for simultaneous localization and mapping (SLAM). Existing LPR methods, which utilize 2D bird's-eye view (BEV) projections of 3D point clouds, achieve competitive performance in efficiency and recognition accuracy. However, these methods often struggle with capturing global contextual information and maintaining robustness to viewpoint variations. To address these challenges, we propose R2SCAT-LPR, a novel, transformer-based model that leverages self-attention and cross-attention mechanisms to extract rotation-robust place feature descriptors from BEV images. R2SCAT-LPR consists of three core modules: (1) R2MPFE, which employs weight-shared cascaded multi-head self-attention (MHSA) to extract multi-level spatial contextual patch features from both the original BEV image and its randomly rotated counterpart; (2) DSCA, which integrates dual-branch self-attention and multi-head cross-attention (MHCA) to capture intrinsic correspondences between multi-level patch features before and after rotation, enhancing the extraction of rotation-robust local features; and (3) a combined NetVLAD module, which aggregates patch features from both the original feature space and the rotated interaction space into a compact and viewpoint-robust global descriptor. Extensive experiments conducted on the KITTI and NCLT datasets validate the effectiveness of the proposed model, demonstrating its robustness to rotation variations and its generalization ability across diverse scenes and LiDAR sensors types. Furthermore, we evaluate the generalization performance and computational efficiency of R2SCAT-LPR on our self-constructed OffRoad-LPR dataset for off-road autonomous driving, verifying its deployability on resource-constrained platforms.

引用

页数：25