Visibility Aware In-Hand Object Pose Tracking in Videos With Transformers

被引:0
|
作者
Xuan Tan, Phan [1 ]
Hoang, Dinh-Cuong [2 ]
Kamioka, Eiji [1 ]
Nguyen, Anh-Nhat [3 ]
Tran, Duc-Thanh [3 ]
Duong, Van-Hiep [3 ]
Mai, Anh-Truong [3 ]
Pham, Duc-Long [3 ]
Phan, Khanh-Toan [3 ]
Dinh, Xuan-Tung [3 ]
Trang, Tran Thi Thuy [3 ]
Pham, Xuan-Duong [3 ]
Nguyen, Nhat-Linh [3 ]
Nguyen, Thu-Uyen [3 ]
Trinh, Viet-Anh [2 ]
Tran, Khanh-Duong [2 ]
Bui, Son-Anh [2 ]
机构
[1] Shibaura Inst Technol, Coll Engn, Tokyo 1358548, Japan
[2] FPT Univ, Greenwich Vietnam, Hanoi 10000, Vietnam
[3] FPT Univ, Dept IT, Hanoi 10000, Vietnam
来源
IEEE ACCESS | 2025年 / 13卷
关键词
Pose estimation; robot vision systems; intelligent systems; deep learning; supervised learning; machine vision;
D O I
10.1109/ACCESS.2025.3545049
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In-hand object pose estimation is essential in various engineering applications, such as quality inspection, reverse engineering, and automated manufacturing processes. However, achieving accurate pose estimation becomes difficult when objects are heavily occluded by the hand or blurred due to motion. To address these challenges, we propose a novel framework that leverages the power of transformers for spatial-temporal reasoning across video sequences. Our approach utilizes transformers to capture both spatial relationships within each frame and temporal dependencies across consecutive frames, allowing the model to aggregate information over time and improve pose predictions. A key innovation of our framework is the introduction of a visibility-aware module, which dynamically adjusts pose estimates based on the object's visibility. This module utilizes temporally-aware features extracted by the transformers, allowing the model to aggregate pose information across multiple frames. By integrating this aggregated information, the model can maintain high accuracy even when portions of the object are not visible in certain frames. This capability is particularly crucial in dynamic environments where the object's appearance can change rapidly due to hand movements or interactions with other objects. Extensive experiments demonstrate that our method significantly outperforms state-of-the-art techniques, achieving a 6% improvement in overall accuracy and over 11% better performance in handling occlusions.
引用
收藏
页码:35733 / 35749
页数:17
相关论文
共 50 条
  • [21] VisuoTactile 6D Pose Estimation of an In-Hand Object Using Vision and Tactile Sensor Data
    Dikhale, Snehal
    Patel, Karankumar
    Dhingra, Daksh
    Naramura, Itoshi
    Hayashi, Akinobu
    Iba, Soshi
    Jamali, Nawid
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2022, 7 (02) : 2148 - 2155
  • [22] Learning Haptic-Based Object Pose Estimation for In-Hand Manipulation Control With Underactuated Robotic Hands
    Azulay, Osher
    Ben-David, Inbar
    Sintov, Avishai
    IEEE TRANSACTIONS ON HAPTICS, 2023, 16 (01) : 73 - 85
  • [23] In-Hand Pose Refinement Based on Contact Point Information
    Iturrate, Inigo
    Kim, Yitaek
    Kramberger, Aljaz
    Sloth, Christoffer
    ADVANCES IN SERVICE AND INDUSTRIAL ROBOTICS, RAAD 2023, 2023, 135 : 29 - 36
  • [24] Robotic hand synergies for in-hand regrasping driven by object information
    Dimou, Dimitrios
    Santos-Victor, Jose
    Moreno, Plinio
    AUTONOMOUS ROBOTS, 2023, 47 (04) : 453 - 464
  • [25] Active In-Hand Object Recognition on a Humanoid Robot
    Browatzki, Bjoern
    Tikhanoff, Vadim
    Metta, Giorgio
    Buelthoff, Heinrich H.
    Wallraven, Christian
    IEEE TRANSACTIONS ON ROBOTICS, 2014, 30 (05) : 1260 - 1269
  • [26] A Comparison of Tactile Sensors for In-Hand Object Location
    Fernandez, Raul
    Vazquez, Andres S.
    Payo, Ismael
    Adan, Antonio
    JOURNAL OF SENSORS, 2016, 2016
  • [27] Robotic hand synergies for in-hand regrasping driven by object information
    Dimitrios Dimou
    José Santos-Victor
    Plinio Moreno
    Autonomous Robots, 2023, 47 : 453 - 464
  • [28] General In-Hand Object Rotation with Vision and Touch
    Qi, Haozhi
    Yi, Brent
    Suresh, Sudharshan
    Lambeta, Mike
    Ma, Yi
    Calandra, Roberto
    Malik, Jitendra
    CONFERENCE ON ROBOT LEARNING, VOL 229, 2023, 229
  • [29] Pose Guided Feature Learning for 3D Object Tracking on RGB Videos
    Majcher, Mateusz
    Kwolek, Bogdan
    PROCEEDINGS OF THE 17TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS (VISAPP), VOL 5, 2022, : 574 - 581
  • [30] Comparison of general object trackers for hand tracking in high-speed videos
    Hiltunen, Ville
    Eerola, Tuomas
    Lensu, Lasse
    Kalviainen, Heikki
    2014 22ND INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2014, : 2215 - 2220