LUOR: A Framework for Language Understanding in Object Retrieval and Grasping

被引:0
|
作者
Yoon, Dongmin [1 ]
Cha, Seonghun [2 ]
Oh, Yoonseon [1 ,2 ]
机构
[1] Hanyang Univ, Dept Artificial Intelligence, Seoul 04763, South Korea
[2] Hanyang Univ, Dept Elect Engn, Seoul 04763, South Korea
基金
新加坡国家研究基金会;
关键词
Grasp detection; multi-modal learning; robotic object retrieval;
D O I
10.1007/s12555-024-0527-7
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In human-centered environments, assistive robots are required to understand verbal commands to retrieve and grasp objects within complex scenes. Previous research on natural language object retrieval tasks has mainly focused on commands explicitly mentioning an object's name. However, in real-world environments, responding to implicit commands based on an object's function is also essential. To address this problem, we propose a new dataset consisting of 712 verb-object pairs containing 78 verbs for 244 ImageNet classes and 336 verb-object pairs covering 54 verbs for 138 ObjectNet classes. Utilizing this dataset, we propose a novel language understanding object retrieval (LUOR) module by fine-tuning the CLIP text encoder. This approach enables effective learning for the downstream task of object retrieval while preserving the object classification performance. Additionally, we integrate LUOR with a YOLOv3-based multi-task detection (MTD) module for simultaneous object and grasp pose detection. This integration enables the robot manipulator to accurately grasp objects based on verbal commands in complex environments containing multiple objects. Our results demonstrate that LUOR outperforms CLIP in both explicit and implicit retrieval tasks while preserving object classification accuracy for both the ImageNet and ObjectNet datasets. Also, the real-world applicability of the integrated system is demonstrated through experiments with the Franka Panda manipulator.
引用
收藏
页码:530 / 540
页数:11
相关论文
共 50 条
  • [41] Framework, design patterns and pattern language for object concurrency
    Silva, AR
    INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICATIONS, VOLS I-III, PROCEEDINGS, 1997, : 1024 - 1033
  • [42] Effects of object recognition on grasping
    Himmelbach, Marc
    COGNITIVE PROCESSING, 2014, 15 (01) : S13 - S13
  • [43] Quality measures for object grasping
    Roa, Maximo
    Suarez, Raul
    Cornella, Jordi
    REVISTA IBEROAMERICANA DE AUTOMATICA E INFORMATICA INDUSTRIAL, 2008, 5 (01): : 66 - +
  • [44] MaRU: A Manga Retrieval and Understanding System Connecting Vision and Language
    Shen, Tom
    Yao, Violet
    Liu, Yixin
    arXiv, 2023,
  • [45] Web retrieval systems and the Greek language: do they have an understanding?
    Lazarinis, Fotis
    JOURNAL OF INFORMATION SCIENCE, 2007, 33 (05) : 622 - 636
  • [46] Object classification as key for algorithmic language processing of metonymy and metaphor A Sensory Language Retrieval approach
    Nergui-Szollossy, Daniel
    2019 10TH IEEE INTERNATIONAL CONFERENCE ON COGNITIVE INFOCOMMUNICATIONS (COGINFOCOM 2019), 2019, : 605 - 613
  • [47] An Object Based Image Retrieval Framework Based on Automatic Image Annotation
    Bhargava, Anurag
    Shekhar, Shashi
    Arya, K. V.
    2014 9TH INTERNATIONAL CONFERENCE ON INDUSTRIAL AND INFORMATION SYSTEMS (ICIIS), 2014, : 81 - 86
  • [48] A NEURAL DOCUMENT LANGUAGE MODELING FRAMEWORK FOR SPOKEN DOCUMENT RETRIEVAL
    Yen, Li-Phen
    Wu, Zhen-Yu
    Chen, Kuan-Yu
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 8139 - 8143
  • [49] Using quantum mechanical framework for language modeling and information retrieval
    Platonov, A., V
    Poleschuk, E. A.
    Bessmertny, I. A.
    Gafurov, N. R.
    2018 IEEE 12TH INTERNATIONAL CONFERENCE ON APPLICATION OF INFORMATION AND COMMUNICATION TECHNOLOGIES (AICT), 2018, : 99 - 102
  • [50] An Object Tuple Model for Understanding Pointer and Array in C Language
    Rong, Wenge
    Xu, Tianfan
    Sun, Zhiwei
    Sun, Zian
    Ouyang, Yuanxin
    Xiong, Zhang
    IEEE TRANSACTIONS ON EDUCATION, 2023, 66 (04) : 318 - 329