Language-guided Robot Grasping: CLIP-based Referring Grasp Synthesis in Clutter

被引:0
|
作者
Tziafas, Georgios [1 ]
Xu, Yucheng [2 ]
Goel, Arushi [2 ]
Kasaei, Mohammadreza [2 ]
Li, Zhibin [3 ]
Kasaei, Hamidreza [1 ]
机构
[1] Univ Groningen, Groningen, Netherlands
[2] Univ Edinburgh, Edinburgh, Midlothian, Scotland
[3] UCL, London, England
来源
基金
欧盟地平线“2020”;
关键词
Language-Guided Robot Grasping; Referring Grasp Synthesis; Visual Grounding;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Robots operating in human-centric environments require the integration of visual grounding and grasping capabilities to effectively manipulate objects based on user instructions. This work focuses on the task of referring grasp synthesis, which predicts a grasp pose for an object referred through natural language in cluttered scenes. Existing approaches often employ multi-stage pipelines that first segment the referred object and then propose a suitable grasp, and are evaluated in simple datasets or simulators that do not capture the complexity of natural indoor scenes. To address these limitations, we develop a challenging benchmark based on cluttered indoor scenes from OCID dataset, for which we generate referring expressions and connect them with 4-DoF grasp poses. Further, we propose a novel end-to-end model (CROG) that leverages the visual grounding capabilities of CLIP to learn grasp synthesis directly from image-text pairs. Our results show that vanilla integration of CLIP with pretrained models transfers poorly in our challenging benchmark, while CROG achieves significant improvements both in terms of grounding and grasping. Extensive robot experiments in both simulation and hardware demonstrate the effectiveness of our approach in challenging interactive object grasping scenarios that include clutter.
引用
收藏
页数:17
相关论文
共 36 条
  • [31] GI-Grasp: Target-Oriented 6DoF Grasping Strategy with Grasp Intuition Based on Vision-Language Models
    Jia, Tong
    Zhang, Haiyu
    Yang, Guowei
    Liu, Yizhe
    Wang, Hao
    Guo, Shiyi
    INTELLIGENT ROBOTICS AND APPLICATIONS, ICIRA 2024, PT II, 2025, 15202 : 89 - 100
  • [32] A novel object slicing based grasp planner for 3D object grasping using underactuated robot gripper
    Sainul, I. A.
    Deb, Sankha
    Deb, A. K.
    45TH ANNUAL CONFERENCE OF THE IEEE INDUSTRIAL ELECTRONICS SOCIETY (IECON 2019), 2019, : 585 - 590
  • [33] Grasp Synthesis Based on Tactile Sensation in Robot Manipulation of Arbitrary Located Object
    Yussof, Hanafiah
    Wada, Jiro
    Ohka, Masahiro
    2009 IEEE/ASME INTERNATIONAL CONFERENCE ON ADVANCED INTELLIGENT MECHATRONICS, VOLS 1-3, 2009, : 560 - +
  • [34] Kinematic redundancy in robot grasp synthesis.: An efficient tree-based representation
    Fernández, C
    Reinoso, S
    Vicente, A
    Aracil, R
    2005 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), VOLS 1-4, 2005, : 1184 - 1189
  • [35] Integrated Learning of Robot Motion and Sentences: Real-Time Prediction of Grasping Motion and Attention based on Language Instructions
    Ito, Hiroshi
    Ichiwara, Hideyuki
    Yamamoto, Kenjiro
    Mori, Hiroki
    Ogata, Tetsuya
    2022 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2022), 2022, : 5404 - 5410
  • [36] Tactile Sensing-Based Control Algorithm for Real-Time Grasp Synthesis in Object Manipulation Tasks of Humanoid Robot Fingers
    Yussof, Hanafiah
    Ohka, Masahiro
    Suzuki, Hirofumi
    Morisawa, Nobuyuki
    2008 17TH IEEE INTERNATIONAL SYMPOSIUM ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION, VOLS 1 AND 2, 2008, : 377 - +