DiffuPose: Monocular 3D Human Pose Estimation via Denoising Diffusion Probabilistic Model

被引：7

作者：

Choi, Jeongjun ^{[1
,2
]}

Shim, Dongseok ^{[1
]}

Kim, H. Jin ^{[1
,2
]}

机构：

[1] Seoul Natl Univ, Artificial Intelligence Inst AIIS, Seoul, South Korea

[2] Automat & Syst Res Inst ASRI, Seoul, South Korea

来源：

2023 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, IROS | 2023年

基金：

新加坡国家研究基金会;

关键词：

D O I：

10.1109/IROS55552.2023.10342204

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Thanks to the development of 2D keypoint detectors, monocular 3D human pose estimation (HPE) via 2D-to-3D uplifting approaches have achieved remarkable improvements. Still, monocular 3D HPE is a challenging problem due to the inherent depth ambiguities and occlusions. To handle this problem, many previous works exploit temporal information to mitigate such difficulties. However, there are many real-world applications where frame sequences are not accessible. This paper focuses on reconstructing a 3D pose from a single 2D keypoint detection. Rather than exploiting temporal information, we alleviate the depth ambiguity by generating multiple 3D pose candidates which can be mapped to an identical 2D keypoint. We build a novel diffusion-based framework to effectively sample diverse 3D poses from an off-the-shelf 2D detector. By considering the correlation between human joints by replacing the conventional denoising U-Net with graph convolutional network, our approach accomplishes further performance improvements. We evaluate our method on the widely adopted Human3.6M and HumanEva-I datasets. Comprehensive experiments are conducted to prove the efficacy of the proposed method, and they confirm that our model outperforms state-of-the-art multi-hypothesis 3D HPE methods.

引用

页码：3773 / 3780

页数：8

共 50 条

[11] On Boosting Single-Frame 3D Human Pose Estimation via Monocular Videos
Li, Zhi
Wang, Xuan
Wang, Fei
Jiang, Peilin
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 2192 - 2201
[12] Monocular 3D Human Pose Estimation by Generation and Ordinal Ranking
Sharma, Saurabh
Varigonda, Pavan Teja
Bindal, Prashast
Sharma, Abhishek
Jain, Arjun
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 2325 - 2334
[13] Double chain networks for monocular 3D human pose estimation
Bai, Guihu
Luo, Yanmin
Pan, Xueliang
Wang, Youjie
Wang, Jia
Guo, Jingming
IMAGE AND VISION COMPUTING, 2022, 123
[14] Deep Kinematics Analysis for Monocular 3D Human Pose Estimation
Xu, Jingwei
Yu, Zhenbo
Ni, Bingbing
Yang, Jiancheng
Yang, Xiaokang
Zhang, Wenjun
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 896 - 905
[15] Monocular 3D Human Pose Estimation by Predicting Depth on Joints
Nie, Bruce Xiaohan
Wei, Ping
Zhu, Song-Chun
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 3467 - 3475
[16] TSwinPose: Enhanced monocular 3D human pose estimation with JointFlow
Li, Muyu
Hu, Henan
Xiong, Jingjing
Zhao, Xudong
Yan, Hong
EXPERT SYSTEMS WITH APPLICATIONS, 2024, 249
[17] LEARNING MONOCULAR 3D HUMAN POSE ESTIMATION WITH SKELETAL INTERPOLATION
Chen, Ziyi
Sugimoto, Akihiro
Lai, Shang-Hong
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4218 - 4222
[18] On the Effect of Temporal Information on Monocular 3D Human Pose Estimation
Brauer, Juergen
Gong, Wenjuan
Gonzalez, Jordi
Arens, Michael
2011 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCV WORKSHOPS), 2011,
[19] Locally Connected Network for Monocular 3D Human Pose Estimation
Ci, Hai
Ma, Xiaoxuan
Wang, Chunyu
Wang, Yizhou
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (03) : 1429 - 1442
[20] Staged cascaded network for monocular 3D human pose estimation
Bing-kun Gao
Zhong-xin Zhang
Cui-na Wu
Chen-lei Wu
Hong-bo Bi
Applied Intelligence, 2023, 53 : 1021 - 1029

← 1 2 3 4 5 →