Transformer-Based Model for Monocular Visual Odometry: A Video Understanding Approach

被引:1
|
作者
Francani, Andre O. [1 ]
Maximo, Marcos R. O. A. [1 ]
机构
[1] Aeronaut Inst Technol, Autonomous Computat Syst Lab, BR-12228900 Sao Jose Dos Campos, SP, Brazil
来源
IEEE ACCESS | 2025年 / 13卷
关键词
Transformers; Visual odometry; Feature extraction; Deep learning; Computer architecture; 6-DOF; Pipelines; Odometry; Vectors; Context modeling; monocular visual odometry; transformer; video understanding;
D O I
10.1109/ACCESS.2025.3531667
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Estimating the camera's pose given images from a single camera is a traditional task in mobile robots and autonomous vehicles. This problem is called monocular visual odometry and often relies on geometric approaches that require considerable engineering effort for a specific scenario. Deep learning methods have been shown to be generalizable after proper training and with a large amount of available data. Transformer-based architectures have dominated the state-of-the-art in natural language processing and computer vision tasks, such as image and video understanding. In this work, we deal with the monocular visual odometry as a video understanding task to estimate the 6 degrees of freedom of a camera's pose. We contribute by presenting the TSformer-VO model based on spatio-temporal self-attention mechanisms to extract features from clips and estimate the motions in an end-to-end manner. Our approach achieved competitive state-of-the-art performance compared with geometry-based and deep learning-based methods on the KITTI visual odometry dataset, outperforming the DeepVO implementation highly accepted in the visual odometry community. The code is publicly available at https://github.com/aofrancani/TSformer-VO.
引用
收藏
页码:13959 / 13971
页数:13
相关论文
共 50 条
  • [31] WGANVO: monocular visual odometry based on generative adversarial networks
    Cremona, Javier
    Uzal, Lucas
    Pire, Taihu
    REVISTA IBEROAMERICANA DE AUTOMATICA E INFORMATICA INDUSTRIAL, 2022, 19 (02): : 144 - 153
  • [32] Monocular Visual Odometry Based on Homogeneous SURF Feature Points
    Si, Zengxiu
    Wu, Xinhua
    Liu, Gang
    5TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER SCIENCE APPLICATIONS AND TECHNOLOGIES (ACSAT 2017), 2017, : 10 - 17
  • [33] A Monocular Visual-Inertial Odometry Based on Hybrid Residuals
    Lai, Zhenghong
    Gui, Jianjun
    Xu, Dengke
    Dong, Hongbin
    Deng, Baosong
    2020 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2020, : 3304 - 3311
  • [34] Appearance-Based Monocular Visual Odometry for Ground Vehicles
    Yu, Yang
    Pradalier, Cedric
    Zong, Guanghua
    2011 IEEE/ASME INTERNATIONAL CONFERENCE ON ADVANCED INTELLIGENT MECHATRONICS (AIM), 2011, : 862 - 867
  • [35] Monocular Visual Odometry Based on Optical Flow and Feature Matching
    Cheng Chuanqi
    Hao Xiangyang
    Zhang Zhenjie
    Zhao Mandan
    2017 29TH CHINESE CONTROL AND DECISION CONFERENCE (CCDC), 2017, : 4554 - 4558
  • [36] A Visible-Thermal Fusion Based Monocular Visual Odometry
    Poujol, Julien
    Aguilera, Cristhian A.
    Danos, Etienne
    Vintimilla, Boris X.
    Toledo, Ricardo
    Sappa, Angel D.
    ROBOT 2015: SECOND IBERIAN ROBOTICS CONFERENCE: ADVANCES IN ROBOTICS, VOL 1, 2016, 417 : 517 - 528
  • [37] A New Approach to Train Convolutional Neural Networks for Monocular Visual Odometry
    Esfahani, Mandi Abolfazli
    Wu, Keyu
    Yuan, Shenghai
    Wang, Han
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE (ICPRAI 2018), 2018, : 66 - 71
  • [38] A New Approach of Monocular Visual Odometry to Trajectory Estimation Within a Plantation
    Araujo, Gabriel L.
    Jorge, Id E. Filho
    Higuti, Vitor A. H.
    Becker, Marcelo
    2021 LATIN AMERICAN ROBOTICS SYMPOSIUM / 2021 BRAZILIAN SYMPOSIUM ON ROBOTICS / 2021 WORKSHOP OF ROBOTICS IN EDUCATION (LARS-SBR-WRE 2021), 2021, : 180 - 185
  • [39] Realtime edge-based visual odometry for a monocular camera
    Tarrio, Juan Jose
    Pedre, Sol
    2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 702 - 710
  • [40] Transformer-based Natural Language Understanding and Generation
    Zhang, Feng
    An, Gaoyun
    Ruan, Qiuqi
    2022 16TH IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP2022), VOL 1, 2022, : 281 - 284