DiffuPose: Monocular 3D Human Pose Estimation via Denoising Diffusion Probabilistic Model

被引:7
|
作者
Choi, Jeongjun [1 ,2 ]
Shim, Dongseok [1 ]
Kim, H. Jin [1 ,2 ]
机构
[1] Seoul Natl Univ, Artificial Intelligence Inst AIIS, Seoul, South Korea
[2] Automat & Syst Res Inst ASRI, Seoul, South Korea
基金
新加坡国家研究基金会;
关键词
D O I
10.1109/IROS55552.2023.10342204
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Thanks to the development of 2D keypoint detectors, monocular 3D human pose estimation (HPE) via 2D-to-3D uplifting approaches have achieved remarkable improvements. Still, monocular 3D HPE is a challenging problem due to the inherent depth ambiguities and occlusions. To handle this problem, many previous works exploit temporal information to mitigate such difficulties. However, there are many real-world applications where frame sequences are not accessible. This paper focuses on reconstructing a 3D pose from a single 2D keypoint detection. Rather than exploiting temporal information, we alleviate the depth ambiguity by generating multiple 3D pose candidates which can be mapped to an identical 2D keypoint. We build a novel diffusion-based framework to effectively sample diverse 3D poses from an off-the-shelf 2D detector. By considering the correlation between human joints by replacing the conventional denoising U-Net with graph convolutional network, our approach accomplishes further performance improvements. We evaluate our method on the widely adopted Human3.6M and HumanEva-I datasets. Comprehensive experiments are conducted to prove the efficacy of the proposed method, and they confirm that our model outperforms state-of-the-art multi-hypothesis 3D HPE methods.
引用
收藏
页码:3773 / 3780
页数:8
相关论文
共 50 条
  • [11] On Boosting Single-Frame 3D Human Pose Estimation via Monocular Videos
    Li, Zhi
    Wang, Xuan
    Wang, Fei
    Jiang, Peilin
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 2192 - 2201
  • [12] Monocular 3D Human Pose Estimation by Generation and Ordinal Ranking
    Sharma, Saurabh
    Varigonda, Pavan Teja
    Bindal, Prashast
    Sharma, Abhishek
    Jain, Arjun
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 2325 - 2334
  • [13] Double chain networks for monocular 3D human pose estimation
    Bai, Guihu
    Luo, Yanmin
    Pan, Xueliang
    Wang, Youjie
    Wang, Jia
    Guo, Jingming
    IMAGE AND VISION COMPUTING, 2022, 123
  • [14] Deep Kinematics Analysis for Monocular 3D Human Pose Estimation
    Xu, Jingwei
    Yu, Zhenbo
    Ni, Bingbing
    Yang, Jiancheng
    Yang, Xiaokang
    Zhang, Wenjun
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 896 - 905
  • [15] Monocular 3D Human Pose Estimation by Predicting Depth on Joints
    Nie, Bruce Xiaohan
    Wei, Ping
    Zhu, Song-Chun
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 3467 - 3475
  • [16] TSwinPose: Enhanced monocular 3D human pose estimation with JointFlow
    Li, Muyu
    Hu, Henan
    Xiong, Jingjing
    Zhao, Xudong
    Yan, Hong
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 249
  • [17] LEARNING MONOCULAR 3D HUMAN POSE ESTIMATION WITH SKELETAL INTERPOLATION
    Chen, Ziyi
    Sugimoto, Akihiro
    Lai, Shang-Hong
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4218 - 4222
  • [18] On the Effect of Temporal Information on Monocular 3D Human Pose Estimation
    Brauer, Juergen
    Gong, Wenjuan
    Gonzalez, Jordi
    Arens, Michael
    2011 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCV WORKSHOPS), 2011,
  • [19] Locally Connected Network for Monocular 3D Human Pose Estimation
    Ci, Hai
    Ma, Xiaoxuan
    Wang, Chunyu
    Wang, Yizhou
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (03) : 1429 - 1442
  • [20] Staged cascaded network for monocular 3D human pose estimation
    Bing-kun Gao
    Zhong-xin Zhang
    Cui-na Wu
    Chen-lei Wu
    Hong-bo Bi
    Applied Intelligence, 2023, 53 : 1021 - 1029