3D hand pose and mesh estimation via a generic Topology-aware Transformer model

被引:0
|
作者
Yu, Shaoqi [1 ,2 ]
Wang, Yintong [1 ,2 ]
Chen, Lili [1 ,2 ]
Zhang, Xiaolin [1 ,2 ,3 ]
Li, Jiamao [1 ,2 ]
机构
[1] Chinese Acad Sci, Shanghai Inst Microsyst & Informat Technol, Shanghai, Peoples R China
[2] Univ Chinese Acad Sci, Beijing, Peoples R China
[3] ShanghaiTech Univ, Shanghai, Peoples R China
来源
关键词
3D hand pose estimation; HandGCNFormer; 3D hand mesh estimation; Graphformer; Transformer; GCN; REGRESSION;
D O I
10.3389/fnbot.2024.1395652
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In Human-Robot Interaction (HRI), accurate 3D hand pose and mesh estimation hold critical importance. However, inferring reasonable and accurate poses in severe self-occlusion and high self-similarity remains an inherent challenge. In order to alleviate the ambiguity caused by invisible and similar joints during HRI, we propose a new Topology-aware Transformer network named HandGCNFormer with depth image as input, incorporating prior knowledge of hand kinematic topology into the network while modeling long-range contextual information. Specifically, we propose a novel Graphformer decoder with an additional Node-offset Graph Convolutional layer (NoffGConv). The Graphformer decoder optimizes the synergy between the Transformer and GCN, capturing long-range dependencies and local topological connections between joints. On top of that, we replace the standard MLP prediction head with a novel Topology-aware head to better exploit local topological constraints for more reasonable and accurate poses. Our method achieves state-of-the-art 3D hand pose estimation performance on four challenging datasets, including Hands2017, NYU, ICVL, and MSRA. To further demonstrate the effectiveness and scalability of our proposed Graphformer Decoder and Topology aware head, we extend our framework to HandGCNFormer-Mesh for the 3D hand mesh estimation task. The extended framework efficiently integrates a shape regressor with the original Graphformer Decoder and Topology aware head, producing Mano parameters. The results on the HO-3D dataset, which contains various and challenging occlusions, show that our HandGCNFormer-Mesh achieves competitive results compared to previous state-of-the-art 3D hand mesh estimation methods.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] HandGCNFormer: A Novel Topology-Aware Transformer Network for 3D Hand Pose Estimation
    Wang, Yintong
    Chen, LiLi
    Li, Jiamao
    Zhang, Xiaolin
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 5664 - 5673
  • [2] CLIP-Hand3D: Exploiting 3D Hand Pose Estimation via Context-Aware Prompting
    Guo, Shaoxiang
    Cai, Qing
    Qi, Lin
    Dong, Junyu
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 4896 - 4907
  • [3] Generic 3D Representation via Pose Estimation and Matching
    Zamir, Amir R.
    Wekel, Tilman
    Agrawal, Pulkit
    Wei, Colin
    Malik, Jitendra
    Savarese, Silvio
    COMPUTER VISION - ECCV 2016, PT III, 2016, 9907 : 535 - 553
  • [4] Temporal-Aware Self-Supervised Learning for 3D Hand Pose and Mesh Estimation in Videos
    Chen, Liangjian
    Lin, Shih-Yao
    Xie, Yusheng
    Lin, Yen-Yu
    Xie, Xiaohui
    2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2021), 2021, : 1049 - 1058
  • [5] Topology-Aware Adaptive Routing for Nonstationary Irregular Mesh in Throttled 3D NoC Systems
    Chen, Kun-Chih
    Lin, Shu-Yen
    Hung, Hui-Shun
    Wu, An-Yeu
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2013, 24 (10) : 2109 - 2120
  • [6] Accurate 3D Hand Pose Estimation for Whole-Body 3D Human Mesh Estimation
    Moon, Gyeongsik
    Choi, Hongsuk
    Lee, Kyoung Mu
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 2307 - 2316
  • [7] Learning Sequential Contexts using Transformer for 3D Hand Pose Estimation
    Khaleghi, Leyla
    Marshall, Joshua
    Etemad, Ali
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 535 - 541
  • [8] HandDAGT: A Denoising Adaptive Graph Transformer for 3D Hand Pose Estimation
    Cheng, Wencan
    Kim, Eunji
    Ko, Jong Hwan
    COMPUTER VISION - ECCV 2024, PT LXXXVIII, 2025, 15146 : 35 - 52
  • [9] MPCTrans: Multi-Perspective Cue-Aware Joint Relationship Representation for 3D Hand Pose Estimation via Swin Transformer
    Wan, Xiangan
    Ju, Jianping
    Tang, Jianying
    Lin, Mingyu
    Rao, Ning
    Chen, Deng
    Liu, Tingting
    Li, Jing
    Bian, Fan
    Xiong, Nicholas
    SENSORS, 2024, 24 (21)
  • [10] Laplacian Mesh Transformer: Dual Attention and Topology Aware Network for 3D Mesh Classification and Segmentation
    Li, Xiao-Juan
    Yang, Jie
    Zhang, Fang-Lue
    COMPUTER VISION, ECCV 2022, PT XXIX, 2022, 13689 : 541 - 560