LUOR: A Framework for Language Understanding in Object Retrieval and Grasping

被引:0
|
作者
Yoon, Dongmin [1 ]
Cha, Seonghun [2 ]
Oh, Yoonseon [1 ,2 ]
机构
[1] Hanyang Univ, Dept Artificial Intelligence, Seoul 04763, South Korea
[2] Hanyang Univ, Dept Elect Engn, Seoul 04763, South Korea
基金
新加坡国家研究基金会;
关键词
Grasp detection; multi-modal learning; robotic object retrieval;
D O I
10.1007/s12555-024-0527-7
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In human-centered environments, assistive robots are required to understand verbal commands to retrieve and grasp objects within complex scenes. Previous research on natural language object retrieval tasks has mainly focused on commands explicitly mentioning an object's name. However, in real-world environments, responding to implicit commands based on an object's function is also essential. To address this problem, we propose a new dataset consisting of 712 verb-object pairs containing 78 verbs for 244 ImageNet classes and 336 verb-object pairs covering 54 verbs for 138 ObjectNet classes. Utilizing this dataset, we propose a novel language understanding object retrieval (LUOR) module by fine-tuning the CLIP text encoder. This approach enables effective learning for the downstream task of object retrieval while preserving the object classification performance. Additionally, we integrate LUOR with a YOLOv3-based multi-task detection (MTD) module for simultaneous object and grasp pose detection. This integration enables the robot manipulator to accurately grasp objects based on verbal commands in complex environments containing multiple objects. Our results demonstrate that LUOR outperforms CLIP in both explicit and implicit retrieval tasks while preserving object classification accuracy for both the ImageNet and ObjectNet datasets. Also, the real-world applicability of the integrated system is demonstrated through experiments with the Franka Panda manipulator.
引用
收藏
页码:530 / 540
页数:11
相关论文
共 50 条
  • [21] VISUAL: An object oriented language for image understanding
    Chen, SL
    Chuang, ER
    Hsieh, WS
    JOURNAL OF SYSTEMS ARCHITECTURE, 1997, 43 (1-5) : 327 - 335
  • [22] A Grasp on Reality: Understanding Grasping Patterns for Object Interaction in Real and Virtual Environments
    Blaga, Andreea Dalia
    Frutos-Pascual, Maite
    Creed, Chris
    Williams, Ian
    2021 IEEE INTERNATIONAL SYMPOSIUM ON MIXED AND AUGMENTED REALITY ADJUNCT PROCEEDINGS (ISMAR-ADJUNCT 2021), 2021, : 391 - 396
  • [24] OBJECT RECOGNITION BY GRASPING
    OKADA, T
    TSUCHIYA, S
    PATTERN RECOGNITION, 1977, 9 (03) : 111 - 119
  • [25] A learning framework for object recognition on image understanding
    Muñoz, X
    Bosch, A
    Martí, J
    Espunya, J
    PATTERN RECOGNITION AND IMAGE ANALYSIS, PT 2, PROCEEDINGS, 2005, 3523 : 311 - 318
  • [26] Learning Robotic Grasping Strategy Based on Natural-Language Object Descriptions
    Rao, Achyutha Bharath
    Krishnan, Krishna
    He, Hongsheng
    2018 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2018, : 882 - 887
  • [27] Meta- and object-language in information retrieval research
    Warner, J
    CANADIAN JOURNAL OF INFORMATION AND LIBRARY SCIENCE-REVUE CANADIENNE DES SCIENCES DE L INFORMATION ET DE BIBLIOTHECONOMIE, 1999, 24 (04): : 34 - 34
  • [28] RETRACTED: Language Models for Web Object Retrieval (Retracted Article)
    Zheng, Jianfeng
    Nie, Zaiqing
    2009 5TH INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, NETWORKING AND MOBILE COMPUTING, VOLS 1-8, 2009, : 5099 - +
  • [29] RETRACTED: Language Models for Web Object Retrieval (Retracted Article)
    Zheng, Jianfeng
    Nie, Zaiqing
    2009 INTERNATIONAL CONFERENCE ON NEW TRENDS IN INFORMATION AND SERVICE SCIENCE (NISS 2009), VOLS 1 AND 2, 2009, : 282 - +
  • [30] Robotic object grasping in context of human grasping and manipulation
    Dzitac, Pave
    Mazid, Abdul Md
    PROCEEDINGS OF THE 2013 6TH IEEE CONFERENCE ON ROBOTICS, AUTOMATION AND MECHATRONICS (RAM), 2013, : 201 - 206