Densely occluded grasping objects detection based on RGB-D fusion

被引:0
|
作者
Li M. [1 ,2 ]
Lu P. [1 ,2 ]
Zhu L. [1 ,2 ]
Zhu M.-Q. [1 ,2 ]
Zou L. [1 ,2 ]
机构
[1] Engineering Research Center of Intelligent Control for Underground Space, Ministry of Education, Xuzhou
[2] School of Information and Control Engineering, China University of Mining and Technology, Xuzhou
来源
Kongzhi yu Juece/Control and Decision | 2023年 / 38卷 / 10期
关键词
deep learning; densely occluded detection; grasp detection; manipulator; multi-scale detection; RGB-D fusion;
D O I
10.13195/j.kzyjc.2022.0259
中图分类号
学科分类号
摘要
Current grasp detection algorithms suffer from the poor accuracy and time-consuming or expensive data annotation in densely occluded scenes. To address this concern, a step-by-step improved solution for object detection and grasp detection based on RGB-D fusion is proposed. The grasp detection model trained on single-object can be directly applied to densely occluded multi-object scenes. Firstly, considering the multi-scale characteristics of objects in densely occluded scenes, the sub-stage and path aggregation (SPA) multi-scale feature fusion module is proposed to enrich the high dimensional feature characterization of middle fusion detector SPA-YOLO-Fusion, so as to locate all objects. Then the GR-ConvNet equipped with RGB-D pixel-level fusion outputs the optimal grasp points of all detected objects. At the same time, the background padding preprocessing algorithm is proposed to reduce the interference of other objects in the GR-ConvNet. The mAP of SPA-YOLO-Fusion is 10 % and 7 % higher than that of YOLOv3-tiny and YOLOv4-tiny on the LineMOD dataset, respectively. The grasp detection accuracy of the GR-ConvNet equipped with the padding algorithm is improved by 23 % compared with the original model on the YODO_Grasp dataset, which is collected from the actual scene. © 2023 Northeast University. All rights reserved.
引用
收藏
页码:2867 / 2874
页数:7
相关论文
共 29 条
  • [1] Liu Y X, Wang S Y, Yao Y F, Et al., Recent researches on robot autonomous grasp technology, Control and Decision, 35, 12, pp. 2817-2828, (2020)
  • [2] Zhang Y Z, Li Q, Cao H, Et al., Single-stage grasp pose detection of manipulator based on multi-level features, Control and Decision, 36, 8, pp. 1815-1824, (2021)
  • [3] Du G G, Wang K, Lian S G, Et al., Vision-based robotic grasping from object localization, object pose estimation to grasp estimation for parallel grippers: A review, Artificial Intelligence Review, 54, 3, pp. 1677-1734, (2021)
  • [4] Kleeberger K, Bormann R, Kraus W, Et al., A survey on learning-based robotic grasping, Current Robotics Reports, 1, 4, pp. 239-249, (2020)
  • [5] Wang H, Sridhar S, Huang J W, Et al., Normalized object coordinate space for category-level 6D object pose and size estimation, IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2637-2646, (2019)
  • [6] Liu Z Y, Duan Q T, Shi S, Et al., RGB-D image saliency detection based on multi-modal feature-fused supervision, Journal of Electronics & Information Technology, 42, 4, pp. 997-1004, (2020)
  • [7] Lin T Y, Goyal P, Girshick R, Et al., Focal loss for dense object detection, IEEE International Conference on Computer Vision, pp. 2999-3007, (2017)
  • [8] Pan X J, Ren Y Q, Sheng K K, Et al., Dynamic refinement network for oriented and densely packed object detection, IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11204-11213, (2020)
  • [9] Lin T Y, Dollar P, Girshick R, Et al., Feature pyramid networks for object detection, IEEE Conference on Computer Vision and Pattern Recognition, pp. 936-944, (2017)
  • [10] Liu S, Qi L, Qin H F, Et al., Path aggregation network for instance segmentation, IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8759-8768, (2018)