Hand Pose Estimation Based on Prior Knowledge and Mesh Supervision

被引:0
|
作者
Sun D. [1 ]
Zhang P. [1 ]
机构
[1] School of Computer Science and Engineering, South China University of Technology, Guangdong, Guangzhou
关键词
hand mesh; hand pose estimation; hand shape estimation; prior knowledge;
D O I
10.12141/j.issn.1000-565X.230420
中图分类号
学科分类号
摘要
Due to the hand self-occlusion and the lack of depth information, the estimation of 3D hand pose based on monocular RGB images is not accurate enough in estimating relative depth of joints, and the generated hand pose violates the biomechanical constraints of the hand. To solve this problem, by combining the prior knowledge contained in the hand structure and the hand grid information, a deep neural network based on prior knowledge and mesh supervision is proposed. The articulated structure of the hand skeleton implies that there exists a specific relationship between the projections of the 3D hand pose in the 2D image plane and the depth direction, but the differences in hand structure between individuals make it difficult to describe this relationship intuitively and formally. Therefore, this paper proposes to fit it through learning. Specific relationships also exist between joint positions and bone lengths of the same finger, bending directions of different segments of the same finger, and bending directions of different fingers, which are designed as loss functions to supervise network training. The proposed neural network generates hand meshes in parallel with hand poses, supervises the network training through mesh annotation, and optimizes the pose estimation without increasing the network complexity. Furthermore, the neural network is trained using a mixed dataset to further improve its generalization capability. Experimental results show that the proposed method outperforms other methods in terms of internal cross-validation accuracy in multiple datasets, cross-dataset validation accuracy, and time and space complexity of the model. As a result, the prior knowledge of hand skeleton and the mesh supervision improve the accuracy of pose estimation while keeping the neural network compact. © 2024 South China University of Technology. All rights reserved.
引用
收藏
页码:138 / 147
页数:9
相关论文
共 25 条
  • [1] Learning to estimate 3D hand pose from single RGB images, Proceedings of the IEEE International Conference on Computer Vision, pp. 4903-4911, (2017)
  • [2] SONG J,, PARK S, Cross-modal deep variational hand pose estimation, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 89-98, (2018)
  • [3] IQBAL U,, MOLCHANOV P,, GALL T, Hand pose estimation via latent 2.5D heatmap regression [C]∥Proceedings of the European Conference on Computer Vision, pp. 118-134, (2018)
  • [4] MUELLER F, GANerated hands for real-time 3D hand tracking from monocular RGB, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 49-59, (2018)
  • [5] SIMON T, MATTHEWS I, Hand key⁃ point detection in single images using multiview boot⁃ strapping, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1145-1153, (2017)
  • [6] SRIDHAR S, Inter⁃ active markerless articulated hand motion tracking using RGB and depth data [C]∥Proceedings of the IEEE In⁃ ternational Conference on Computer Vision, pp. 2456-2463, (2013)
  • [7] ROMERO J,, TZIONAS D,, BLACK M J., Embodied hands: modeling and capturing hands and bodies to⁃ gether [J], ACM Transactions on Graphics, 36, 6, pp. 1-17, (2017)
  • [8] TORR P H S., 3D hand shape and pose from images in the wild [C]∥Proceed⁃ ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10843-10852, (2019)
  • [9] ZHANG X,, HUANG H, Hand image understanding via deep multi-task learning [C]∥Pro⁃ ceedings of the IEEE/CVF International Conference on Computer Vision, pp. 11281-11292, (2021)
  • [10] CHEN P,, CHEN Y,, YANG D, Image-to-UV prediction network for accurate and high-fidelity 3D hand mesh modeling [C]∥Pro⁃ ceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12929-12938, (2021)