LUOR: A Framework for Language Understanding in Object Retrieval and Grasping

被引:0
|
作者
Yoon, Dongmin [1 ]
Cha, Seonghun [2 ]
Oh, Yoonseon [1 ,2 ]
机构
[1] Hanyang Univ, Dept Artificial Intelligence, Seoul 04763, South Korea
[2] Hanyang Univ, Dept Elect Engn, Seoul 04763, South Korea
基金
新加坡国家研究基金会;
关键词
Grasp detection; multi-modal learning; robotic object retrieval;
D O I
10.1007/s12555-024-0527-7
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In human-centered environments, assistive robots are required to understand verbal commands to retrieve and grasp objects within complex scenes. Previous research on natural language object retrieval tasks has mainly focused on commands explicitly mentioning an object's name. However, in real-world environments, responding to implicit commands based on an object's function is also essential. To address this problem, we propose a new dataset consisting of 712 verb-object pairs containing 78 verbs for 244 ImageNet classes and 336 verb-object pairs covering 54 verbs for 138 ObjectNet classes. Utilizing this dataset, we propose a novel language understanding object retrieval (LUOR) module by fine-tuning the CLIP text encoder. This approach enables effective learning for the downstream task of object retrieval while preserving the object classification performance. Additionally, we integrate LUOR with a YOLOv3-based multi-task detection (MTD) module for simultaneous object and grasp pose detection. This integration enables the robot manipulator to accurately grasp objects based on verbal commands in complex environments containing multiple objects. Our results demonstrate that LUOR outperforms CLIP in both explicit and implicit retrieval tasks while preserving object classification accuracy for both the ImageNet and ObjectNet datasets. Also, the real-world applicability of the integrated system is demonstrated through experiments with the Franka Panda manipulator.
引用
收藏
页码:530 / 540
页数:11
相关论文
共 50 条
  • [31] An object-oriented framework for developing information retrieval applications
    Jose, JM
    Hendry, DG
    Harper, DJ
    OOIS 2001: 7TH INTERNATIONAL CONFERENCE ON OBJECT-ORIENTED INFORMATION SYSTEMS, PROCEEDINGS, 2001, : 259 - 268
  • [32] UNDERSTANDING, GRASPING AND LUCK
    Khalifa, Kareem
    EPISTEME-A JOURNAL OF INDIVIDUAL AND SOCIAL EPISTEMOLOGY, 2013, 10 (01): : 1 - 17
  • [33] Application of Bayesian Framework in Natural Language Understanding
    Goyal, Pawan
    Behera, Laxmidhar
    McGinnity, T. M.
    IETE TECHNICAL REVIEW, 2008, 25 (05) : 251 - 269
  • [34] SLAM-based Grasping Framework for Robotic Arm Navigation and Object Model Construction
    Wongwilai, Natchanon
    Niparnan, Nattee
    Sudsang, Attawith
    2014 IEEE 4TH ANNUAL INTERNATIONAL CONFERENCE ON CYBER TECHNOLOGY IN AUTOMATION, CONTROL, AND INTELLIGENT SYSTEMS (CYBER), 2014, : 156 - 161
  • [35] IR FRAMEWORK - AN OBJECT-ORIENTED FRAMEWORK FOR DEVELOPING INFORMATION-RETRIEVAL SYSTEMS
    WADE, S
    BRAEKEVELT, P
    PROGRAM-AUTOMATED LIBRARY AND INFORMATION SYSTEMS, 1995, 29 (01): : 15 - 29
  • [36] Quality measures for object grasping
    Roa, Máximo
    Suárez, Raúl
    Cornellà, Jordi
    RIAI - Revista Iberoamericana de Automatica e Informatica Industrial, 2008, 5 (01): : 66 - 82
  • [37] Visually guided object grasping
    Horaud, R
    Dornaika, F
    Espiau, B
    IEEE TRANSACTIONS ON ROBOTICS AND AUTOMATION, 1998, 14 (04): : 525 - 532
  • [38] Visually guided object grasping
    GRAVIR-CNRS and INRIA Rhone-Alpes, Montbonnot, France
    IEEE Trans Rob Autom, 4 (525-532):
  • [39] Inhibition of vection by grasping an object
    Masaki Mori
    Takeharu Seno
    Experimental Brain Research, 2018, 236 : 3215 - 3221
  • [40] Inhibition of vection by grasping an object
    Mori, Masaki
    Seno, Takeharu
    EXPERIMENTAL BRAIN RESEARCH, 2018, 236 (12) : 3215 - 3221