Visibility Aware In-Hand Object Pose Tracking in Videos With Transformers

被引:0
|
作者
Xuan Tan, Phan [1 ]
Hoang, Dinh-Cuong [2 ]
Kamioka, Eiji [1 ]
Nguyen, Anh-Nhat [3 ]
Tran, Duc-Thanh [3 ]
Duong, Van-Hiep [3 ]
Mai, Anh-Truong [3 ]
Pham, Duc-Long [3 ]
Phan, Khanh-Toan [3 ]
Dinh, Xuan-Tung [3 ]
Trang, Tran Thi Thuy [3 ]
Pham, Xuan-Duong [3 ]
Nguyen, Nhat-Linh [3 ]
Nguyen, Thu-Uyen [3 ]
Trinh, Viet-Anh [2 ]
Tran, Khanh-Duong [2 ]
Bui, Son-Anh [2 ]
机构
[1] Shibaura Inst Technol, Coll Engn, Tokyo 1358548, Japan
[2] FPT Univ, Greenwich Vietnam, Hanoi 10000, Vietnam
[3] FPT Univ, Dept IT, Hanoi 10000, Vietnam
来源
IEEE ACCESS | 2025年 / 13卷
关键词
Pose estimation; robot vision systems; intelligent systems; deep learning; supervised learning; machine vision;
D O I
10.1109/ACCESS.2025.3545049
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In-hand object pose estimation is essential in various engineering applications, such as quality inspection, reverse engineering, and automated manufacturing processes. However, achieving accurate pose estimation becomes difficult when objects are heavily occluded by the hand or blurred due to motion. To address these challenges, we propose a novel framework that leverages the power of transformers for spatial-temporal reasoning across video sequences. Our approach utilizes transformers to capture both spatial relationships within each frame and temporal dependencies across consecutive frames, allowing the model to aggregate information over time and improve pose predictions. A key innovation of our framework is the introduction of a visibility-aware module, which dynamically adjusts pose estimates based on the object's visibility. This module utilizes temporally-aware features extracted by the transformers, allowing the model to aggregate pose information across multiple frames. By integrating this aggregated information, the model can maintain high accuracy even when portions of the object are not visible in certain frames. This capability is particularly crucial in dynamic environments where the object's appearance can change rapidly due to hand movements or interactions with other objects. Extensive experiments demonstrate that our method significantly outperforms state-of-the-art techniques, achieving a 6% improvement in overall accuracy and over 11% better performance in handling occlusions.
引用
收藏
页码:35733 / 35749
页数:17
相关论文
共 50 条
  • [31] Optical Proximity Sensing for Pose Estimation During In-Hand Manipulation
    Lancaster, Patrick
    Gyawali, Pratik
    Mavrogiannis, Christoforos
    Srinivasa, Siddhartha S.
    Smith, Joshua R.
    2022 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2022, : 11818 - 11825
  • [32] Enhancing 2D Hand Pose Detection and Tracking in Surgical Videos by Attention Mechanism
    Quang-Dai Nguyen
    Anh-Thuan Bui
    Tri-Hai Nguyen
    Trong-Hop Do
    INTELLIGENT DISTRIBUTED COMPUTING XV, IDC 2022, 2023, 1089 : 168 - 177
  • [33] In-Hand Pose Estimation Using Hand-Mounted RGB Cameras and Visuotactile Sensors
    Gao, Yuan
    Matsuoka, Shogo
    Wan, Weiwei
    Kiyokawa, Takuya
    Koyama, Keisuke
    Harada, Kensuke
    IEEE ACCESS, 2023, 11 : 17218 - 17232
  • [34] In-hand manipulation in young children: Rotation of an object in the fingers
    Pehoski, C
    Henderson, A
    TickleDegnen, L
    AMERICAN JOURNAL OF OCCUPATIONAL THERAPY, 1997, 51 (07): : 544 - 552
  • [35] Visual-Tactile Sensing for In-Hand Object Reconstruction
    Xu, Wenqiang
    Yu, Zhenjun
    Xue, Han
    Ye, Ruolin
    Yao, Siqiong
    Lu, Cewu
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 8803 - 8812
  • [36] In-Hand Object rotation via Rapid Motor Adaptation
    Qi, Haozhi
    Kumar, Ashish
    Calandra, Roberto
    Ma, Yi
    Malik, Jitendra
    CONFERENCE ON ROBOT LEARNING, VOL 205, 2022, 205 : 1722 - 1732
  • [37] MagicHand: In-Hand Perception of Object Characteristics for Dexterous Manipulation
    Li, Hui
    Yihun, Yimesker
    He, Hongsheng
    SOCIAL ROBOTICS, ICSR 2018, 2018, 11357 : 523 - 532
  • [38] Visibility Aware Human-Object Interaction Tracking from Single RGB Camera
    Xie, Xianghui
    Bhatnagar, Bharat Lal
    Pons-Moll, Gerard
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 4757 - 4768
  • [39] Object Tracking in Satellite Videos With Distractor-Occlusion-Aware Correlation Particle Filters
    Li, Yangfan
    Wang, Nan
    Li, Wei
    Li, Xiong
    Rao, Mengbin
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 12
  • [40] Motion-Aware Correlation Filter-Based Object Tracking in Satellite Videos
    Lin, Bin
    Zheng, Jinlei
    Xue, Chaocan
    Fu, Lei
    Li, Ying
    Shen, Qiang
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 13