Real-Time Reinforcement Learning for Optimal Viewpoint Selection in Monocular 3D Human Pose Estimation

被引：0

作者：

Lee, Sanghyeon ^{[1
]}

Hwang, Yoonho ^{[1
]}

Lee, Jong Taek ^{[1
]}

机构：

[1] Kyungpook Natl Univ, Sch Comp Sci & Engn, Daegu 41566, South Korea

来源：

IEEE ACCESS | 2024年 / 12卷

基金：

新加坡国家研究基金会;

关键词：

Three-dimensional displays; Cameras; Real-time systems; Accuracy; Pose estimation; Heating systems; Uncertainty; Drones; Solid modeling; Feature extraction; 3D human pose estimation; next best viewpoint selection; deep learning; reinforcement learning;

D O I：

10.1109/ACCESS.2024.3514146

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Monocular 3D human pose estimation (HPE) presents an inherently ill-posed challenge, complicated by issues such as depth ambiguity and uncertainty. Estimating 3D poses with a single camera heavily depends on viewpoint, resulting in poor pose estimation accuracy. To address these challenges, we propose a real-time reinforcement learning-based viewpoint selection method that dynamically adjusts the camera viewpoint to optimize pose estimation. Our method extracts features encoding depth ambiguity and uncertainty from 2D-to-3D lifting, allowing the model to identify the optimal camera movements without requiring multiple cameras. We evaluate our approach on a publicly available real-world dataset, adjusted to simulate a realistic setting of drone flights capturing human motions. Our approach, compared against baseline strategies including fixed, random, and rotating camera movements with various 3D HPE models, significantly enhances the accuracy and robustness of pose estimation. In particular, it achieves a notable improvement, reducing pose estimation errors by over 30% compared to fixed and random camera movements. These results highlight the effectiveness of our method in optimizing viewpoint selection for real-time 3D HPE, making it a practical solution for single-camera setups in dynamic environments. Our code is available at https://github.com/knu-vis/nbv-pose.

引用

页码：191020 / 191029

页数：10

共 50 条

[31] SolePoser: Real-Time 3D Human Pose Estimation using Insole Pressure Sensors
Wu, Erwin
Khirodkar, Rawal
Koike, Hideki
Kitani, Kris
PROCEEDINGS OF THE 37TH ANNUAL ACM SYMPOSIUM ON USER INTERFACE SOFTWARE AND TECHNOLOGY, USIT 2024, 2024,
[32] Towards Viewpoint Invariant 3D Human Pose Estimation
Haque, Albert
Peng, Boya
Luo, Zelun
Alahi, Alexandre
Yeung, Serena
Li Fei-Fei
COMPUTER VISION - ECCV 2016, PT I, 2016, 9905 : 160 - 177
[33] 3D Hand and Object Pose Estimation for Real-time Human-robot Interaction
Bandi, Chaitanya
Kisner, Hannes
Thomas, Urike
PROCEEDINGS OF THE 17TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS (VISAPP), VOL 4, 2022, : 770 - 780
[34] Multi-View Pose Generator Based on Deep Learning for Monocular 3D Human Pose Estimation
Sun, Jun
Wang, Mantao
Zhao, Xin
Zhang, Dejun
SYMMETRY-BASEL, 2020, 12 (07):
[35] Monocular 3D Human Pose Estimation by Generation and Ordinal Ranking
Sharma, Saurabh
Varigonda, Pavan Teja
Bindal, Prashast
Sharma, Abhishek
Jain, Arjun
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 2325 - 2334
[36] Double chain networks for monocular 3D human pose estimation
Bai, Guihu
Luo, Yanmin
Pan, Xueliang
Wang, Youjie
Wang, Jia
Guo, Jingming
IMAGE AND VISION COMPUTING, 2022, 123
[37] Real-time 3D Pose Estimation from Single Depth Images
Schnuerer, Thomas
Fuchs, Stefan
Eisenbach, Markus
Gross, Horst-Michael
PROCEEDINGS OF THE 14TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS (VISAPP), VOL 5, 2019, : 716 - 724
[38] Deep Kinematics Analysis for Monocular 3D Human Pose Estimation
Xu, Jingwei
Yu, Zhenbo
Ni, Bingbing
Yang, Jiancheng
Yang, Xiaokang
Zhang, Wenjun
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 896 - 905
[39] Monocular 3D Human Pose Estimation by Predicting Depth on Joints
Nie, Bruce Xiaohan
Wei, Ping
Zhu, Song-Chun
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 3467 - 3475
[40] Learning Monocular 3D Human Pose Estimation from Multi-view Images
Rhodin, Helge
Sporri, Jorg
Katircioglu, Isinsu
Constantin, Victor
Meyer, Frederic
Mueller, Erich
Salzmann, Mathieu
Fua, Pascal
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 8437 - 8446

← 1 2 3 4 5 →