Transformer-Based Model for Monocular Visual Odometry: A Video Understanding Approach

被引:1
|
作者
Francani, Andre O. [1 ]
Maximo, Marcos R. O. A. [1 ]
机构
[1] Aeronaut Inst Technol, Autonomous Computat Syst Lab, BR-12228900 Sao Jose Dos Campos, SP, Brazil
来源
IEEE ACCESS | 2025年 / 13卷
关键词
Transformers; Visual odometry; Feature extraction; Deep learning; Computer architecture; 6-DOF; Pipelines; Odometry; Vectors; Context modeling; monocular visual odometry; transformer; video understanding;
D O I
10.1109/ACCESS.2025.3531667
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Estimating the camera's pose given images from a single camera is a traditional task in mobile robots and autonomous vehicles. This problem is called monocular visual odometry and often relies on geometric approaches that require considerable engineering effort for a specific scenario. Deep learning methods have been shown to be generalizable after proper training and with a large amount of available data. Transformer-based architectures have dominated the state-of-the-art in natural language processing and computer vision tasks, such as image and video understanding. In this work, we deal with the monocular visual odometry as a video understanding task to estimate the 6 degrees of freedom of a camera's pose. We contribute by presenting the TSformer-VO model based on spatio-temporal self-attention mechanisms to extract features from clips and estimate the motions in an end-to-end manner. Our approach achieved competitive state-of-the-art performance compared with geometry-based and deep learning-based methods on the KITTI visual odometry dataset, outperforming the DeepVO implementation highly accepted in the visual odometry community. The code is publicly available at https://github.com/aofrancani/TSformer-VO.
引用
收藏
页码:13959 / 13971
页数:13
相关论文
共 50 条
  • [41] Overview of Transformer-Based Visual Segmentation Techniques
    Li, Wen-Sheng
    Zhang, Jing
    Zhuo, Li
    Wu, Xin-Jia
    Yan, Yi
    Jisuanji Xuebao/Chinese Journal of Computers, 2024, 47 (12): : 2760 - 2782
  • [42] Transformer-Based Approach to Melanoma Detection
    Cirrincione, Giansalvo
    Cannata, Sergio
    Cicceri, Giovanni
    Prinzi, Francesco
    Currieri, Tiziana
    Lovino, Marta
    Militello, Carmelo
    Pasero, Eros
    Vitabile, Salvatore
    SENSORS, 2023, 23 (12)
  • [43] Transformer-based approach to variable typing
    Rey, Charles Arthel
    Danguilan, Jose Lorenzo
    Mendoza, Karl Patrick
    Remolona, Miguel Francisco
    HELIYON, 2023, 9 (10)
  • [44] Monocular Non-linear Photometric Transformation Visual Odometry Based on Direct Sparse Odometry
    Yuan, Junyi
    Hirota, Kaoru
    Zhang, Zelong
    Dai, Yaping
    2023 35TH CHINESE CONTROL AND DECISION CONFERENCE, CCDC, 2023, : 2682 - 2687
  • [45] Video text tracking with transformer-based local search
    Zhou, Xingsheng
    Wang, Cheng
    Wang, Xinggang
    Liu, Wenyu
    NEUROCOMPUTING, 2024, 609
  • [46] Swin transformer-based traffic video text tracking
    Yu, Jinyao
    Qian, Jiangbo
    Xin, Yu
    Wang, Chong
    Dong, Yihong
    APPLIED INTELLIGENCE, 2024, 54 (21) : 10581 - 10595
  • [47] SuperVO: A Monocular Visual Odometry based on Learned Feature Matching with GNN
    Rao, Shi
    2021 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS AND COMPUTER ENGINEERING (ICCECE), 2021, : 18 - 26
  • [48] An Unsupervised Monocular Visual Odometry Based on Multi-Scale Modeling
    Zhi, Henghui
    Yin, Chenyang
    Li, Huibin
    Pang, Shanmin
    SENSORS, 2022, 22 (14)
  • [49] Experimental Evaluation of Direct Monocular Visual Odometry Based on Nonlinear Optimization
    Liang, Jian
    Cheng, Xin
    He, Yezhou
    Li, Xiaoli
    Liu, Huashan
    2019 WORLD ROBOT CONFERENCE SYMPOSIUM ON ADVANCED ROBOTICS AND AUTOMATION (WRC SARA 2019), 2019, : 291 - 295
  • [50] Semi-Direct Monocular Visual Odometry Based on Visual-Inertial Fusion
    Gong Z.
    Zhang X.
    Peng X.
    Li X.
    Zhang, Xiaoli (zhxl@xmu.edu.cn), 1600, Chinese Academy of Sciences (42): : 595 - 605