A hybrid network for estimating 3D interacting hand pose from a single RGB image

被引:0
|
作者
Bao, Wenxia [1 ]
Gao, Qiuyue [1 ]
Yang, Xianjun [2 ]
机构
[1] Anhui Univ, Sch Elect & Informat Engn, Hefei 230601, Anhui, Peoples R China
[2] Chinese Acad Sci, Hefei Inst Phys Sci, Hefei 230031, Anhui, Peoples R China
关键词
3D hand pose estimation; Interacting Hand; Hybrid network; End to end network; TEXT; RECOGNITION; KHATT;
D O I
10.1007/s11760-024-03043-1
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The estimation of 3D interacting hand pose from a single RGB image is a challenging problem. The hands tend to occlude each other and are self-similar in two-handed interactions. In this study, a simple, accurate end-to-end framework called HybridPoseNet is proposed for estimating 3D interactive hand pose. The hybrid network employs an encoder-decoder architecture. More specifically, the feature encoder is a hybrid structure that combines a convolutional neural network (CNN) with a transformer to accomplish the feature encoding of hand information. An ordinary CNN is employed to extract the local detailed features of a given image, and a vision transformer is used to capture the long-distance spatial interactions between the cross-positional feature vectors. Moreover, the 3D pose decoder is based on left and right network branches, which are fused via a feature enhancement module (FEM). The FEM helps reduce the ambiguity in appearance caused by the self-similarity of the hands. The decoder elevates the 2D pose to the 3D pose by estimating two depth components. The ablation experiments demonstrate the effectiveness of each module in the network. In addition, comprehensive experiments on the InterHand2.6M dataset show that the proposed method outperforms previous state-of-the-art methods for estimating interactive hand pose.
引用
收藏
页码:3801 / 3814
页数:14
相关论文
共 50 条
  • [31] Context-Aware Network for 3D Human Pose Estimation from Monocular RGB Image
    Yin, Binyi
    Zhang, Dongbo
    Li, Shuai
    Hao, Aimin
    Qin, Hong
    2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
  • [32] Skeleton Transformer Networks: 3D Human Pose and Skinned Mesh from Single RGB Image
    Yoshiyasu, Yusuke
    Sagawa, Ryusuke
    Ayusawa, Ko
    Murai, Akihiko
    COMPUTER VISION - ACCV 2018, PT IV, 2019, 11364 : 485 - 500
  • [33] Multi-virtual View Scoring Network for 3D Hand Pose Estimation from a Single Depth Image
    Tian, Yimeng
    Li, Chen
    Tian, Lihua
    ARTIFICIAL INTELLIGENCE AND ROBOTICS, ISAIR 2023, 2024, 1998 : 147 - 164
  • [34] Using a single RGB frame for real time 3D hand pose estimation in the wild
    Panteleris, Paschalis
    Oikonomidis, Iason
    Argyro, Antonis
    2018 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2018), 2018, : 436 - 445
  • [35] 3D hand mesh reconstruction from a monocular RGB image
    Hao Peng
    Chuhua Xian
    Yunbo Zhang
    The Visual Computer, 2020, 36 : 2227 - 2239
  • [36] 3D hand mesh reconstruction from a monocular RGB image
    Peng, Hao
    Xian, Chuhua
    Zhang, Yunbo
    VISUAL COMPUTER, 2020, 36 (10-12): : 2227 - 2239
  • [37] 3D Hand Pose Estimation From Monocular RGB With Feature Interaction Module
    Guo, Shaoxiang
    Rigall, Eric
    Ju, Yakun
    Dong, Junyu
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (08) : 5293 - 5306
  • [38] 3D Hand Pose Detection in Egocentric RGB-D Images
    Rogez, Gregory
    Khademi, Maryam
    Supancic, J. S., III
    Montiel, J. M. M.
    Ramanan, Deva
    COMPUTER VISION - ECCV 2014 WORKSHOPS, PT I, 2015, 8925 : 356 - 371
  • [39] 3D Hand Pose Estimation from Monocular RGB with Feature Interaction Module
    Guo, Shaoxiang
    Rigall, Eric
    Ju, Yakun
    Dong, Junyu
    IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32 (08): : 5293 - 5306
  • [40] Deep network based 3D hand keypoints prediction from single RGB images
    Wang, Jialong
    Sang, Nong
    PATTERN RECOGNITION AND TRACKING XXX, 2019, 10995