Attention-Based Grasp Detection With Monocular Depth Estimation

被引:1
|
作者
Xuan Tan, Phan [1 ]
Hoang, Dinh-Cuong [2 ]
Nguyen, Anh-Nhat [3 ]
Nguyen, Van-Thiep [3 ]
Vu, Van-Duc [3 ]
Nguyen, Thu-Uyen [3 ]
Hoang, Ngoc-Anh [3 ]
Phan, Khanh-Toan [3 ]
Tran, Duc-Thanh [3 ]
Vu, Duy-Quang [3 ]
Ngo, Phuc-Quan [2 ]
Duong, Quang-Tri [2 ]
Ho, Ngoc-Trung [3 ]
Tran, Cong-Trinh [3 ]
Duong, Van-Hiep [3 ]
Mai, Anh-Truong [3 ]
机构
[1] Shibaura Inst Technol, Coll Engn, Tokyo 1358548, Japan
[2] FPT Univ, Greenwich Vietnam, Hanoi 10000, Vietnam
[3] FPT Univ, IT Dept, Hanoi 10000, Vietnam
关键词
Pose estimation; robot vision systems; intelligent systems; deep learning; supervised learning; machine vision;
D O I
10.1109/ACCESS.2024.3397718
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Grasp detection plays a pivotal role in robotic manipulation, allowing robots to interact with and manipulate objects in their surroundings. Traditionally, this has relied on three-dimensional (3D) point cloud data acquired from specialized depth cameras. However, the limited availability of such sensors in real-world scenarios poses a significant challenge. In many practical applications, robots operate in diverse environments where obtaining high-quality 3D point cloud data may be impractical or impossible. This paper introduces an innovative approach to grasp generation using color images, thereby eliminating the need for dedicated depth sensors. Our method capitalizes on advanced deep learning techniques for depth estimation directly from color images. Instead of relying on conventional depth sensors, our approach computes predicted point clouds based on estimated depth images derived directly from Red-Green-Blue (RGB) input data. To our knowledge, this is the first study to explore the use of predicted depth data for grasp detection, moving away from the traditional dependence on depth sensors. The novelty of this work is the development of a fusion module that seamlessly integrates features extracted from RGB images with those inferred from the predicted point clouds. Additionally, we adapt a voting mechanism from our previous work (VoteGrasp) to enhance robustness to occlusion and generate collision-free grasps. Experimental evaluations conducted on standard datasets validate the effectiveness of our approach, demonstrating its superior performance in generating grasp configurations compared to existing methods. With our proposed method, we achieved a significant 4% improvement in average precision compared to state-of-the-art grasp detection methods. Furthermore, our method demonstrates promising practical viability through real robot grasping experiments, achieving an impressive 84% success rate.
引用
收藏
页码:65041 / 65057
页数:17
相关论文
共 50 条
  • [41] Transformer-based monocular depth estimation with hybrid attention fusion and progressive regression
    Liu, Peng
    Zhang, Zonghua
    Meng, Zhaozong
    Gao, Nan
    NEUROCOMPUTING, 2025, 620
  • [42] A multiscale dilated convolution and mixed-order attention-based deep neural network for monocular depth prediction
    Huihui Xu
    Fei Li
    SN Applied Sciences, 2023, 5
  • [43] A multiscale dilated convolution and mixed-order attention-based deep neural network for monocular depth prediction
    Xu, Huihui
    Li, Fei
    SN APPLIED SCIENCES, 2023, 5 (01):
  • [44] Attention-based Multi-Level Fusion Network for Light Field Depth Estimation
    Chen, Jiaxin
    Zhang, Shuo
    Lin, Youfang
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 1009 - 1017
  • [45] Depth Estimation Using Monocular Camera for Real-World Multi-Object Grasp Detection for Robotic Arm
    Jain, Vayam
    Gupta, Arjun
    Srivastva, Astik
    Aggarwal, Nilesh
    Anunay
    Harikesh
    PROCEEDINGS OF 2023 THE 8TH INTERNATIONAL CONFERENCE ON SYSTEMS, CONTROL AND COMMUNICATIONS, ICSCC 2023, 2023, : 7 - 18
  • [46] Out-of-Distribution Detection for Monocular Depth Estimation
    Hornauer, Julia
    Holzbock, Adrian
    Belagiannis, Vasileios
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 1911 - 1921
  • [47] YOLO MDE: Object Detection with Monocular Depth Estimation
    Yu, Jongsub
    Choi, Hyukdoo
    ELECTRONICS, 2022, 11 (01)
  • [48] Monocular Depth Estimation Based on Unsupervised Learning
    Liu, Wan
    Sun, Yan
    Wang, XuCheng
    Yang, Lin
    Zheng, Zhenrong
    OPTOELECTRONIC IMAGING AND MULTIMEDIA TECHNOLOGY VI, 2019, 11187
  • [49] Monocular depth estimation based on dense connections
    Wang, Quande
    Cheng, Kai
    Huazhong Keji Daxue Xuebao (Ziran Kexue Ban)/Journal of Huazhong University of Science and Technology (Natural Science Edition), 2023, 51 (11): : 75 - 82
  • [50] Attention based multilayer feature fusion convolutional neural network for unsupervised monocular depth estimation
    Lei, Zeyu
    Wang, Yan
    Li, Zijian
    Yang, Junyao
    NEUROCOMPUTING, 2021, 423 : 343 - 352