CLIP feature-based randomized control using images and text for multiple tasks and robots

被引:0
|
作者
Shibata, Kazuki [1 ]
Deguchi, Hideki [1 ]
Taguchi, Shun [1 ]
机构
[1] Toyota Cent Res & Dev Labs Inc, Collaborat Intelligence Res Domain, Nagakute, Aichi, Japan
关键词
Vision-language model; CLIP; randomized controls;
D O I
10.1080/01691864.2024.2379381
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
This study presents a control framework leveraging vision language models (VLMs) for multiple tasks and robots. Notably, existing control methods using VLMs have achieved high performance in various tasks and robots in the training environment. However, these methods incur high costs for learning control policies for tasks and robots other than those in the training environment. Considering the application of industrial and household robots, learning in novel environments where robots are introduced is challenging. To address this issue, we propose a control framework that does not require learning control policies. Our framework combines the vision-language CLIP model with a randomized control. CLIP computes the similarity between images and texts by embedding them in the feature space. This study employs CLIP to compute the similarity between camera images and text representing the target state. In our method, the robot is controlled by a randomized controller that simultaneously explores and increases the similarity gradients. Moreover, we fine-tune the CLIP to improve the performance of the proposed method. Consequently, we confirm the effectiveness of our approach through a multitask simulation and a real robot experiment using a two-wheeled robot and robot arm.
引用
收藏
页码:1066 / 1078
页数:13
相关论文
共 50 条
  • [1] Multiple feature-based contrast enhancement of ROI of backlit images
    Yadav, Gaurav
    Yadav, Dilip Kumar
    MACHINE VISION AND APPLICATIONS, 2022, 33 (01)
  • [2] Multiple feature-based contrast enhancement of ROI of backlit images
    Gaurav Yadav
    Dilip Kumar Yadav
    Machine Vision and Applications, 2022, 33
  • [3] Feature-based visual servo using middle images
    Department of Aerospace Engineering, Tokyo Metropolitan Institute of Technology, 6-6 Asahigaoka, Hino-shi, Tokyo, 191-0065, Japan
    Nihon Kikai Gakkai Ronbunshu C, 2006, 5 (1544-1551):
  • [4] Feature-based Assessment of Text Readability
    Zhang, Lixiao
    Liu, Zaiying
    Ni, Jun
    2013 SEVENTH INTERNATIONAL CONFERENCE ON INTERNET COMPUTING FOR ENGINEERING AND SCIENCE (ICICSE 2013), 2013, : 51 - 54
  • [5] MULTIPLE LAYERS OF CONTRASTED IMAGES FOR ROBUST FEATURE-BASED VISUAL TRACKING
    Wang, Xi
    Christie, Marc
    Marchand, Eric
    2018 25TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2018, : 241 - 245
  • [6] Feature-based georegistration of aerial images
    Sheikh, Y
    Khan, S
    Shah, M
    GEOSENSOR NETWORKS, 2005, : 125 - 147
  • [7] FEATURE-BASED REGISTRATION OF RETINAL IMAGES
    PELI, E
    AUGLIERE, RA
    TIMBERLAKE, GT
    IEEE TRANSACTIONS ON MEDICAL IMAGING, 1987, 6 (03) : 272 - 278
  • [8] Feature-Based Steganalysis for JPEG Images
    Li, Zhuo
    Lu, Kuijun
    Zeng, Xianting
    Pan, Xuezeng
    ICDIP 2009: INTERNATIONAL CONFERENCE ON DIGITAL IMAGE PROCESSING, PROCEEDINGS, 2009, : 76 - 80
  • [9] Feature-Based Subjectivity Classification of Filipino Text
    Regalado, Ralph Vincent J.
    Cheng, Charibeth K.
    2012 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2012), 2012, : 57 - 60
  • [10] Detection and tracking of multiple targets on portal images using feature-based learning and weighted optical flow
    Guo, Kaiming
    Teo, Troy P. T.
    Wang, Yang
    Pistorius, Stephen
    MEDICAL PHYSICS, 2017, 44 (08) : 4378 - 4378