Language-Guided Category Push-Grasp Synergy Learning in Clutter by Efficiently Perceiving Object Manipulation Space

被引：0

作者：

Zhao, Min ^{[1
,2
]}

Zuo, Guoyu ^{[1
,2
]}

Yu, Shuangyue ^{[1
,2
]}

Luo, Yongkang ^{[3
]}

Liu, Chunfang ^{[1
,2
]}

Gong, Daoxiong ^{[1
,2
]}

机构：

[1] Beijing Univ Sci & Technol, Sch Informat Engn, Beijing, Peoples R China

[2] Beijing Key Lab Comp Intelligence & Intelligent Sy, Beijing 100124, Peoples R China

[3] Chinese Acad Sci, Inst Automat, Beijing 100190, Peoples R China

来源：

IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS | 2025年 / 21卷 / 02期

基金：

中国国家自然科学基金;

关键词：

Robots; Grasping; Semantic segmentation; Cognition; Image color analysis; Annotations; Accuracy; Feature extraction; Collision avoidance; Training; Category push-grasp synergy; cluttered scene; language-guided; object manipulation space;

D O I：

10.1109/TII.2024.3488774

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In flexible manufacturing, robots need to swiftly adapt to constantly changing production tasks. However, it remains a challenging problem for robots to grasp objects of specific categories through language instructions to complete production tasks in cluttered scenes. To address this issue, this article proposes a language-guided category push-grasp synergy network following a cognitive-decision framework. First, inspired by how humans can understand the world through interactions with the environment, we propose an environment state difference embodied self-supervision method that enables robots to autonomously collect embodied multimodal data and generate ground truths that eliminate annotation errors for cognition network training. Second, we develop a language-guided embodied multimodal object cognition network that fuses color and depth image information, enhancing the object cognition ability of robots in cluttered scenes and enabling dynamic semantic segmentation based on language commands. Finally, we propose an object manipulation space metric to measure the manipulable space of target objects, linking the reward function with metric changes before and after actions, thereby enhancing the system's perception of the manipulation space and improving operational performance. Experiments conducted in both simulated and real-world environments demonstrate that our proposed method outperforms existing state-of-the-art methods and can be generalized for grasping novel objects.

引用

页码：1783 / 1792

页数：10

共 1 条

[1] Language-Guided Dexterous Functional Grasping by LLM Generated Grasp Functionality and Synergy for Humanoid Manipulation
Li, Zhuo
Liu, Junjia
Li, Zhihao
Dong, Zhipeng
Teng, Tao
Ou, Yongsheng
Caldwell, Darwin
Chen, Fei
IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2025,

← 1 →