Attention-Based Grasp Detection With Monocular Depth Estimation

被引:1
|
作者
Xuan Tan, Phan [1 ]
Hoang, Dinh-Cuong [2 ]
Nguyen, Anh-Nhat [3 ]
Nguyen, Van-Thiep [3 ]
Vu, Van-Duc [3 ]
Nguyen, Thu-Uyen [3 ]
Hoang, Ngoc-Anh [3 ]
Phan, Khanh-Toan [3 ]
Tran, Duc-Thanh [3 ]
Vu, Duy-Quang [3 ]
Ngo, Phuc-Quan [2 ]
Duong, Quang-Tri [2 ]
Ho, Ngoc-Trung [3 ]
Tran, Cong-Trinh [3 ]
Duong, Van-Hiep [3 ]
Mai, Anh-Truong [3 ]
机构
[1] Shibaura Inst Technol, Coll Engn, Tokyo 1358548, Japan
[2] FPT Univ, Greenwich Vietnam, Hanoi 10000, Vietnam
[3] FPT Univ, IT Dept, Hanoi 10000, Vietnam
关键词
Pose estimation; robot vision systems; intelligent systems; deep learning; supervised learning; machine vision;
D O I
10.1109/ACCESS.2024.3397718
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Grasp detection plays a pivotal role in robotic manipulation, allowing robots to interact with and manipulate objects in their surroundings. Traditionally, this has relied on three-dimensional (3D) point cloud data acquired from specialized depth cameras. However, the limited availability of such sensors in real-world scenarios poses a significant challenge. In many practical applications, robots operate in diverse environments where obtaining high-quality 3D point cloud data may be impractical or impossible. This paper introduces an innovative approach to grasp generation using color images, thereby eliminating the need for dedicated depth sensors. Our method capitalizes on advanced deep learning techniques for depth estimation directly from color images. Instead of relying on conventional depth sensors, our approach computes predicted point clouds based on estimated depth images derived directly from Red-Green-Blue (RGB) input data. To our knowledge, this is the first study to explore the use of predicted depth data for grasp detection, moving away from the traditional dependence on depth sensors. The novelty of this work is the development of a fusion module that seamlessly integrates features extracted from RGB images with those inferred from the predicted point clouds. Additionally, we adapt a voting mechanism from our previous work (VoteGrasp) to enhance robustness to occlusion and generate collision-free grasps. Experimental evaluations conducted on standard datasets validate the effectiveness of our approach, demonstrating its superior performance in generating grasp configurations compared to existing methods. With our proposed method, we achieved a significant 4% improvement in average precision compared to state-of-the-art grasp detection methods. Furthermore, our method demonstrates promising practical viability through real robot grasping experiments, achieving an impressive 84% success rate.
引用
收藏
页码:65041 / 65057
页数:17
相关论文
共 50 条
  • [31] Hierarchical Attention-Based Sensor Fusion Strategy for Depth Estimation in Diverse Weather
    Xiong, Mengchen
    Xu, Xiao
    Yang, Dong
    Seguel, Fabian
    Steinbach, Eckehard
    INTERNATIONAL JOURNAL OF SEMANTIC COMPUTING, 2023, 17 (03) : 455 - 475
  • [32] Monocular depth estimation with multi-view attention autoencoder
    Geunho Jung
    Sang Min Yoon
    Multimedia Tools and Applications, 2022, 81 : 33759 - 33770
  • [33] Patch-Wise Attention Network for Monocular Depth Estimation
    Lee, Sihaeng
    Lee, Janghyeon
    Kim, Byungju
    Yi, Eojindl
    Kim, Junmo
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 1873 - 1881
  • [34] Monocular Depth Estimation with Optical Flow Attention for Autonomous Drones
    Shimhada, Tomoyasu
    Nishikawa, Hiroki
    Kong, Xiangbo
    Tomiyama, Hiroyuki
    2022 19TH INTERNATIONAL SOC DESIGN CONFERENCE (ISOCC), 2022, : 197 - 198
  • [35] DEEP MONOCULAR VIDEO DEPTH ESTIMATION USING TEMPORAL ATTENTION
    Ren, Haoyu
    El-khamy, Mostafa
    Lee, Jungwon
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 1988 - 1992
  • [36] Monocular depth estimation with multi-view attention autoencoder
    Jung, Geunho
    Yoon, Sang Min
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (23) : 33759 - 33770
  • [37] MAMo: Leveraging Memory and Attention for Monocular Video Depth Estimation
    Yasarla, Rajeev
    Cai, Hong
    Jeong, Jisoo
    Shi, Yunxiao
    Garrepalli, Risheek
    Porikli, Fatih
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 8720 - 8730
  • [38] Boosting Monocular Depth Estimation with Channel Attention and Mutual Learning
    Takagi, Kazunari
    Ito, Seiya
    Kaneko, Naoshi
    Sumi, Kazuhiko
    2019 JOINT 8TH INTERNATIONAL CONFERENCE ON INFORMATICS, ELECTRONICS & VISION (ICIEV) AND 2019 3RD INTERNATIONAL CONFERENCE ON IMAGING, VISION & PATTERN RECOGNITION (ICIVPR) WITH INTERNATIONAL CONFERENCE ON ACTIVITY AND BEHAVIOR COMPUTING (ABC), 2019, : 228 - 233
  • [39] Unsupervised Monocular Depth Estimation with Attention Based Inception Pipe and Overlap Regularized Loss
    Jiang, Xiaoyuan
    Chen, Xihai
    Zhang, Zhao
    2021 THE 5TH INTERNATIONAL CONFERENCE ON VIDEO AND IMAGE PROCESSING, ICVIP 2021, 2021, : 44 - 48
  • [40] Monocular Image Depth Estimation Based on Multi-Scale Attention Oriented Network
    Liu J.
    Wen J.
    Liang Y.
    Huanan Ligong Daxue Xuebao/Journal of South China University of Technology (Natural Science), 2020, 48 (12): : 52 - 62