Visibility Aware In-Hand Object Pose Tracking in Videos With Transformers

被引:0
|
作者
Xuan Tan, Phan [1 ]
Hoang, Dinh-Cuong [2 ]
Kamioka, Eiji [1 ]
Nguyen, Anh-Nhat [3 ]
Tran, Duc-Thanh [3 ]
Duong, Van-Hiep [3 ]
Mai, Anh-Truong [3 ]
Pham, Duc-Long [3 ]
Phan, Khanh-Toan [3 ]
Dinh, Xuan-Tung [3 ]
Trang, Tran Thi Thuy [3 ]
Pham, Xuan-Duong [3 ]
Nguyen, Nhat-Linh [3 ]
Nguyen, Thu-Uyen [3 ]
Trinh, Viet-Anh [2 ]
Tran, Khanh-Duong [2 ]
Bui, Son-Anh [2 ]
机构
[1] Shibaura Inst Technol, Coll Engn, Tokyo 1358548, Japan
[2] FPT Univ, Greenwich Vietnam, Hanoi 10000, Vietnam
[3] FPT Univ, Dept IT, Hanoi 10000, Vietnam
来源
IEEE ACCESS | 2025年 / 13卷
关键词
Pose estimation; robot vision systems; intelligent systems; deep learning; supervised learning; machine vision;
D O I
10.1109/ACCESS.2025.3545049
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In-hand object pose estimation is essential in various engineering applications, such as quality inspection, reverse engineering, and automated manufacturing processes. However, achieving accurate pose estimation becomes difficult when objects are heavily occluded by the hand or blurred due to motion. To address these challenges, we propose a novel framework that leverages the power of transformers for spatial-temporal reasoning across video sequences. Our approach utilizes transformers to capture both spatial relationships within each frame and temporal dependencies across consecutive frames, allowing the model to aggregate information over time and improve pose predictions. A key innovation of our framework is the introduction of a visibility-aware module, which dynamically adjusts pose estimates based on the object's visibility. This module utilizes temporally-aware features extracted by the transformers, allowing the model to aggregate pose information across multiple frames. By integrating this aggregated information, the model can maintain high accuracy even when portions of the object are not visible in certain frames. This capability is particularly crucial in dynamic environments where the object's appearance can change rapidly due to hand movements or interactions with other objects. Extensive experiments demonstrate that our method significantly outperforms state-of-the-art techniques, achieving a 6% improvement in overall accuracy and over 11% better performance in handling occlusions.
引用
收藏
页码:35733 / 35749
页数:17
相关论文
共 50 条
  • [1] Enhancing Generalizable 6D Pose Tracking of an In-Hand Object With Tactile Sensing
    Liu, Yun
    Xu, Xiaomeng
    Chen, Weihang
    Yuan, Haocheng
    Wang, He
    Xu, Jing
    Chen, Rui
    Yi, Li
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (02) : 1106 - 1113
  • [2] Tactile-Based In-Hand Object Pose Estimation
    Alvarez, David
    Roa, Maximo A.
    Moreno, Luis
    ROBOT 2017: THIRD IBERIAN ROBOTICS CONFERENCE, VOL 2, 2018, 694 : 716 - 728
  • [3] In-Hand Object Pose Tracking via Contact Feedback and GPU-Accelerated Robotic Simulation
    Liang, Jacky
    Handa, Ankur
    Van Wyk, Karl
    Makoviychuk, Viktor
    Kroemer, Oliver
    Fox, Dieter
    2020 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2020, : 6203 - 6209
  • [4] Object Learning for 6D Pose Estimation and Grasping from RGB-D Videos of In-hand Manipulation
    Patten, Timothy
    Park, Kiru
    Leitner, Markus
    Wolfram, Kevin
    Vincze, Markus
    2021 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2021, : 4831 - 4838
  • [5] Manipulator and object tracking for in-hand 3D object modeling
    Krainin, Michael
    Henry, Peter
    Ren, Xiaofeng
    Fox, Dieter
    INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2011, 30 (11): : 1311 - 1327
  • [6] Fusing Joint Measurements and Visual Features for In-Hand Object Pose Estimation
    Pfanne, Martin
    Chalon, Maxime
    Stulp, Freek
    Albu-Schaeffer, Alin
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2018, 3 (04): : 3497 - 3504
  • [7] Temporally guided articulated hand pose tracking in surgical videos
    Louis, Nathan
    Zhou, Luowei
    Yule, Steven J.
    Dias, Roger D.
    Manojlovich, Milisa
    Pagani, Francis D.
    Likosky, Donald S.
    Corso, Jason J.
    INTERNATIONAL JOURNAL OF COMPUTER ASSISTED RADIOLOGY AND SURGERY, 2023, 18 (01) : 117 - 125
  • [8] Temporally guided articulated hand pose tracking in surgical videos
    Nathan Louis
    Luowei Zhou
    Steven J. Yule
    Roger D. Dias
    Milisa Manojlovich
    Francis D. Pagani
    Donald S. Likosky
    Jason J. Corso
    International Journal of Computer Assisted Radiology and Surgery, 2023, 18 : 117 - 125
  • [9] Online in-hand object localization
    Chalon, Maxime
    Reinecke, Jens
    Pfanne, Martin
    2013 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2013, : 2977 - 2984
  • [10] ViHOPE: Visuotactile In-Hand Object 6D Pose Estimation With Shape Completion
    Li, Hongyu
    Dikhale, Snehal
    Iba, Soshi
    Jamali, Nawid
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2023, 8 (11) : 6963 - 6970