SWformer-VO: A Monocular Visual Odometry Model Based on Swin Transformer

被引:3
|
作者
Wu, Zhigang [1 ]
Zhu, Yaohui [1 ]
机构
[1] Jiangxi Univ Sci & Technol, Sch Energy & Mech Engn, Nanchang 330013, Peoples R China
关键词
Deep learning; monocular visual odometry; transformer; DEPTH;
D O I
10.1109/LRA.2024.3384911
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
This letter introduces a novel monocular visual odometry network structure, leveraging the Swin Transformer as the backbone network, named SWformer-VO. It can directly estimate the six degrees of freedom camera pose under monocular camera conditions by utilizing a modest volume of image sequence data with an end-to-end methodology. SWformer-VO introduces an Embed module called "Mixture Embed", which fuses consecutive pairs of images into a single frame and converts them into tokens passed into the backbone network. This approach replaces traditional temporal sequence schemes by addressing the problem at the image level. Building upon this foundation, various parameters of the backbone network are continually improved and optimized. Additionally, experiments are conducted to explore the impact of different layers and depths of the backbone network on accuracy. Excitingly, on the KITTI dataset, SWformer-VO demonstrates superior accuracy compared with common deep learning-based methods such as SFMlearner, Deep-VO, TSformer-VO, Depth-VO-Feat, GeoNet, Masked Gans and others introduced in recent years. Moreover, the effectiveness of SWformer-VO is also validated on our self-collected dataset consisting of nine indoor corridor routes for visual odometry.
引用
收藏
页码:4766 / 4773
页数:8
相关论文
共 50 条
  • [1] Transformer-Based Model for Monocular Visual Odometry: A Video Understanding Approach
    Francani, Andre O.
    Maximo, Marcos R. O. A.
    IEEE ACCESS, 2025, 13 : 13959 - 13971
  • [2] Transformer-Based Self-Supervised Monocular Depth and Visual Odometry
    Zhao, Hongru
    Qiao, Xiuquan
    Ma, Yi
    Tafazolli, Rahim
    IEEE SENSORS JOURNAL, 2023, 23 (02) : 1436 - 1446
  • [3] Dense Prediction Transformer for Scale Estimation in Monocular Visual Odometry
    Francani, Andre O.
    Maximo, Marcos R. O. A.
    2022 LATIN AMERICAN ROBOTICS SYMPOSIUM (LARS), 2022 BRAZILIAN SYMPOSIUM ON ROBOTICS (SBR), AND 2022 WORKSHOP ON ROBOTICS IN EDUCATION (WRE), 2022, : 312 - 317
  • [4] RAUM-VO: Rotational Adjusted Unsupervised Monocular Visual Odometry
    Cimarelli, Claudio
    Bavle, Hriday
    Sanchez-Lopez, Jose Luis
    Voos, Holger
    SENSORS, 2022, 22 (07)
  • [5] D2VO: Monocular Deep Direct Visual Odometry
    Jia, Qizeng
    Pu, Yuechuan
    Chen, Jingyu
    Cheng, Junda
    Liao, Chunyuan
    Yang, Xin
    2020 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2020, : 10158 - 10165
  • [6] Monocular Visual Odometry Based on Hybrid Parameterization
    Mohamed, Sherif A. S.
    Haghbayan, Mohammad-Hashem
    Heikkonen, Jukka
    Tenhunen, Hannu
    Plosila, Juha
    TWELFTH INTERNATIONAL CONFERENCE ON MACHINE VISION (ICMV 2019), 2020, 11433
  • [7] ARM-VO: an efficient monocular visual odometry for ground vehicles on ARM CPUs
    Nejad, Zana Zakaryaie
    Ahmadabadian, Ali Hosseininaveh
    MACHINE VISION AND APPLICATIONS, 2019, 30 (06) : 1061 - 1070
  • [8] ARM-VO: an efficient monocular visual odometry for ground vehicles on ARM CPUs
    Zana Zakaryaie Nejad
    Ali Hosseininaveh Ahmadabadian
    Machine Vision and Applications, 2019, 30 : 1061 - 1070
  • [9] Monocular Visual Odometry Based on Trifocal Tensor Constraint
    Chen, Y. J.
    Yang, G. L.
    Jiang, Y. X.
    Liu, X. Y.
    2018 INTERNATIONAL CONFERENCE ON CONTROL ENGINEERING AND ARTIFICIAL INTELLIGENCE (CCEAI 2018), 2018, 976
  • [10] Inertial Monocular Visual Odometry Based on RUPF Algorithm
    Hou, Juanrou
    Wang, Zhanqing
    Zhang, Yanshun
    PROCEEDINGS OF THE 38TH CHINESE CONTROL CONFERENCE (CCC), 2019, : 3885 - 3891