3D hand pose and mesh estimation via a generic Topology-aware Transformer model

被引:0
|
作者
Yu, Shaoqi [1 ,2 ]
Wang, Yintong [1 ,2 ]
Chen, Lili [1 ,2 ]
Zhang, Xiaolin [1 ,2 ,3 ]
Li, Jiamao [1 ,2 ]
机构
[1] Chinese Acad Sci, Shanghai Inst Microsyst & Informat Technol, Shanghai, Peoples R China
[2] Univ Chinese Acad Sci, Beijing, Peoples R China
[3] ShanghaiTech Univ, Shanghai, Peoples R China
来源
FRONTIERS IN NEUROROBOTICS | 2024年 / 18卷
关键词
3D hand pose estimation; HandGCNFormer; 3D hand mesh estimation; Graphformer; Transformer; GCN; REGRESSION;
D O I
10.3389/fnbot.2024.1395652
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In Human-Robot Interaction (HRI), accurate 3D hand pose and mesh estimation hold critical importance. However, inferring reasonable and accurate poses in severe self-occlusion and high self-similarity remains an inherent challenge. In order to alleviate the ambiguity caused by invisible and similar joints during HRI, we propose a new Topology-aware Transformer network named HandGCNFormer with depth image as input, incorporating prior knowledge of hand kinematic topology into the network while modeling long-range contextual information. Specifically, we propose a novel Graphformer decoder with an additional Node-offset Graph Convolutional layer (NoffGConv). The Graphformer decoder optimizes the synergy between the Transformer and GCN, capturing long-range dependencies and local topological connections between joints. On top of that, we replace the standard MLP prediction head with a novel Topology-aware head to better exploit local topological constraints for more reasonable and accurate poses. Our method achieves state-of-the-art 3D hand pose estimation performance on four challenging datasets, including Hands2017, NYU, ICVL, and MSRA. To further demonstrate the effectiveness and scalability of our proposed Graphformer Decoder and Topology aware head, we extend our framework to HandGCNFormer-Mesh for the 3D hand mesh estimation task. The extended framework efficiently integrates a shape regressor with the original Graphformer Decoder and Topology aware head, producing Mano parameters. The results on the HO-3D dataset, which contains various and challenging occlusions, show that our HandGCNFormer-Mesh achieves competitive results compared to previous state-of-the-art 3D hand mesh estimation methods.
引用
收藏
页数:15
相关论文
共 50 条
  • [21] Neural Template: Topology-aware Reconstruction and Disentangled Generation of 3D Meshes
    Hui, Ka-Hei
    Li, Ruihui
    Hu, Jingyu
    Fu, Chi-Wing
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 18551 - 18561
  • [22] Chart Point Flow for Topology-Aware 3D Point Cloud Generation
    Kimura, Takumi
    Matsubara, Takashi
    Uehara, Kuniaki
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 1396 - 1404
  • [23] Learning scale-aware relationships via Laplacian decomposition-based transformer for 3D human pose estimation
    Kim, Jeonghwan
    Kwon, Hyukmin
    Lim, Seong Yong
    Kim, Wonjun
    MULTIMEDIA SYSTEMS, 2024, 30 (01)
  • [24] Learning scale-aware relationships via Laplacian decomposition-based transformer for 3D human pose estimation
    Jeonghwan Kim
    Hyukmin Kwon
    Seong Yong Lim
    Wonjun Kim
    Multimedia Systems, 2024, 30
  • [25] Monocular 3D Hand Mesh Recovery via Dual Noise Estimation
    Li, Hanhui
    Lin, Xiaojian
    Huang, Xuan
    Yang, Zejun
    Wang, Zhisheng
    Liang, Xiaodan
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 4, 2024, : 3046 - 3054
  • [26] 3D generic object categorization, localization and pose estimation
    Savarese, Silvio
    Fei-Fei, Li
    2007 IEEE 11TH INTERNATIONAL CONFERENCE ON COMPUTER VISION, VOLS 1-6, 2007, : 1245 - 1252
  • [27] Model-based hand pose estimation via spatial-temporal hand parsing and 3D fingertip localization
    Liang, Hui
    Yuan, Junsong
    Thalmann, Daniel
    Zhang, Zhengyou
    VISUAL COMPUTER, 2013, 29 (6-8): : 837 - 848
  • [28] Model-based hand pose estimation via spatial-temporal hand parsing and 3D fingertip localization
    Hui Liang
    Junsong Yuan
    Daniel Thalmann
    Zhengyou Zhang
    The Visual Computer, 2013, 29 : 837 - 848
  • [29] AssemblyHands: Towards Egocentric Activity Understanding via 3D Hand Pose Estimation
    Ohkawa, Takehiko
    He, Kun
    Sener, Fadime
    Hodan, Tomas
    Tran, Luan
    Keskin, Cem
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 12999 - 13008
  • [30] Geometry-Aware 3D Hand-Object Pose Estimation Under Occlusion via Hierarchical Feature Decoupling
    Cai, Yuting
    Pan, Huimin
    Yang, Jiayi
    Liu, Yichen
    Gao, Quanli
    Wang, Xihan
    ELECTRONICS, 2025, 14 (05):