Real-time estimation method of target 3D pose based on multi-branch architecture

被引：0

作者：

Hong Y. ^{[1
,2
]}

Liu J. ^{[1
]}

Luo S. ^{[2
]}

Chen X. ^{[2
]}

Li D. ^{[1
]}

Zhang Q. ^{[3
]}

机构：

[1] State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan

[2] Mobile Broadcasting and Information Service Industry Innovation Research Institute (Wuhan) Co. Ltd., Ezhou

[3] College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin

来源：

Zhongguo Guanxing Jishu Xuebao/Journal of Chinese Inertial Technology | 2024年 / 32卷 / 04期

关键词：

multi-branch architecture; object detection; pose estimation; quaternion orientation; three-dimensional pose;

D O I：

10.13695/j.cnki.12-1222/o3.2024.04.003

中图分类号：

学科分类号：

摘要：

A real-time method for estimating object 3D pose based on a multi-branch architecture is proposed, aiming to solve the issues of low precision in real-time pose estimation and slow convergence of regression solving models caused by the large scale and range of target dimensions and positions in the vehicle-road cooperative application scenario. On the basis of the model architecture of the target 2D detection algorithm, a branch structure for position estimation is designed for outputting the target pose quaternion, the 3D spatial position of the target relative to the camera and the target size. In the training stage, corresponding loss functions are designed for the pose estimation branch, in which the vector unitisation operator is used to regress the pose quaternion, and the logarithmic remapping algorithm is used to regress the target dimensions and target-to-camera distances. In the inference stage, the 3D pose of the target is solved based on the quaternion and target-to-camera distances output from the model, so as to achieve the accurate pose estimation. In the pose accuracy verification of the OVRC dataset, the maximum mean square error of the position coordinate is 1.94 m, and the maximum mean square error of the attitude angle estimation result is 3.98°. In the relative accuracy test experiment of the Kitti dataset, the detection accuracy is improved by 3.22% compared with the PVNet method, and the inference efficiency is improved by 1 times compared with the 3DBB method. © 2024 Editorial Department of Journal of Chinese Inertial Technology. All rights reserved.

引用

页码：336 / 345

页数：9

共 20 条

[1] Zhao X, Chen G., Situation and development of ship navigation, Navigation and Control, 19, Z1, pp. 82-87, (2020)
[2] Liu M, Hu X, Guo M, Et al., Multistage inertial-frame dynamic coarse alignment algorithm with double-velocity model, Journal of Chinese Inertial Technology, 28, 2, pp. 159-164, (2020)
[3] Lin X., Visual/INS/LiDAR integration for high-precision pose estimation in complex urban environment, Acta Geodaetica et Cartographica Sinica, 52, 5, (2023)
[4] Li Z, Heyden A, Oskarsson M., Template based human pose and shape estimation from a single RGB-D image, 2019 International Conference on Pattern Recognition Applications and Methods (ICPRAM), pp. 574-581, (2019)
[5] Zhang X, Zheng L, Tan Z, Et al., Visual localization method based on feature coding and dynamic routing optimization, Journal of Chinese Inertial Technology, 30, 4, pp. 451-460, (2022)
[6] Wang J, Jin Y, Guo P, Et al., Survey of camera pose estimation methods based on deep learning, Computer Engineering and Applications, 59, 7, pp. 1-14, (2023)
[7] Hu L, Zhang Y, Wang Y, Et al., Salient preprocessing: Robotic ICP pose estimation based on SIFT features, Machines, 11, 2, (2023)
[8] Sun C, Jia M, Yu Q., Affine approximation projection model based geo-targeting method with unmanned aerial vehiclel, Journal of Chinese Inertial Technology, 30, 1, pp. 104-112, (2022)
[9] Deng P, Wu M., Human motion pose recognition method based on machine learning, Journal of Chinese Inertial Technology, 30, 1, pp. 37-43, (2022)
[10] Wadim K, Fabian M, Federico T, Et al., SSD-6D: Making RGB-based 3D detection and 6D pose estimation great again, 2017 International Conference on Computer Vision (ICCV), pp. 1530-1538, (2017)

← 1 2 →