GPT-4V(ision) for Robotics: Multimodal Task Planning From Human Demonstration

被引:2
|
作者
Wake, Naoki [1 ]
Kanehira, Atsushi [1 ]
Sasabuchi, Kazuhiro [1 ]
Takamatsu, Jun [1 ]
Ikeuchi, Katsushi [1 ]
机构
[1] Microsoft, Appl Robot Res, Redmond, WA 98052 USA
来源
关键词
Robots; Affordances; Pipelines; Planning; Collision avoidance; Visualization; Machine vision; Grounding; Data models; Training; Task and motion planning; task planning; imitation learning;
D O I
10.1109/LRA.2024.3477090
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
We introduce a pipeline that enhances a general-purpose Vision Language Model, GPT-4V(ision), to facilitate one-shot visual teaching for robotic manipulation. This system analyzes videos of humans performing tasks and outputs executable robot programs that incorporate insights into affordances. The process begins with GPT-4 V analyzing the videos to obtain textual explanations of environmental and action details. A GPT-4-based task planner then encodes these details into a symbolic task plan. Subsequently, vision systems spatially and temporally ground the task plan in the videos-objects are identified using an open-vocabulary object detector, and hand-object interactions are analyzed to pinpoint moments of grasping and releasing. This spatiotemporal grounding allows for the gathering of affordance information (e.g., grasp types, waypoints, and body postures) critical for robot execution. Experiments across various scenarios demonstrate the method's efficacy in enabling real robots to operate from one-shot human demonstrations. Meanwhile, quantitative tests have revealed instances of hallucination in GPT-4 V, highlighting the importance of incorporating human supervision within the pipeline.
引用
收藏
页码:10567 / 10574
页数:8
相关论文
共 50 条
  • [41] Generalizable task representation learning from human demonstration videos: a geometric approach
    Jin, Jun
    Jagersand, Martin
    2022 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2022), 2022, : 2504 - 2510
  • [42] A learning from demonstration framework for adaptive task and motion planning in varying package-to-order scenarios
    Ma, Ruidong
    Chen, Jingyu
    Oyekan, John
    ROBOTICS AND COMPUTER-INTEGRATED MANUFACTURING, 2023, 82
  • [43] LEARNING CONTRACTING NONLINEAR DYNAMICS FROM HUMAN DEMONSTRATION FOR ROBOT MOTION PLANNING
    Ravichandar, Harish
    Dani, Ashwin
    PROCEEDINGS OF THE ASME 8TH ANNUAL DYNAMIC SYSTEMS AND CONTROL CONFERENCE, 2015, VOL 2, 2016,
  • [44] Improving Task Skill Transfer Method by Acquiring Impedance Parameters from Human Demonstration
    Shimizu, Masayuki
    Endo, Yamato
    Onda, Hiromu
    Yoon, Woo-Keun
    Torii, Takao
    2013 IEEE INTERNATIONAL CONFERENCE ON MECHATRONICS AND AUTOMATION (ICMA), 2013, : 1033 - 1038
  • [45] Towards cognitive robots:: Building hierarchical task representations of manipulations from human demonstration
    Zöllner, R
    Pardowitz, M
    Knoop, S
    Dillmann, R
    2005 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), VOLS 1-4, 2005, : 1535 - 1540
  • [46] The Accuracy of the Multimodal Large Language Model GPT-4 on Sample Questions From the Interventional Radiology Board Examination Response
    Ariyaratne, Sisith
    Jenko, Nathan
    Davies, A. Mark
    Iyengar, Karthikeyan P.
    Botchu, Rajesh
    ACADEMIC RADIOLOGY, 2024, 31 (08) : 3477 - 3477
  • [47] Negative Result for Learning from Demonstration: Challenges for End-Users Teaching Robots with Task And Motion Planning Abstractions
    Gopalan, Nakul
    Moorman, Nina
    Natarajan, Manisha
    Gombolay, Matthew
    ROBOTICS: SCIENCE AND SYSTEM XVIII, 2022,
  • [48] Hierarchical Task Planning from Object Goal State for Human-Assist Robot
    Takayanagi, Takayoshi
    Kurose, Yusuke
    Harada, Tatsuya
    2019 IEEE 15TH INTERNATIONAL CONFERENCE ON AUTOMATION SCIENCE AND ENGINEERING (CASE), 2019, : 1359 - 1366
  • [49] Activation of human area V4 in a delayed match-to-sample colour task
    McKeefry, D. J.
    Zeki, S.
    PERCEPTION, 1998, 27 : 175 - 175
  • [50] Assessing habitat risk from human activities to inform coastal and marine spatial planning: a demonstration in Belize
    Arkema, Katie K.
    Verutes, Gregory
    Bernhardt, Joanna R.
    Clarke, Chantalle
    Rosado, Samir
    Canto, Maritza
    Wood, Spencer A.
    Ruckelshaus, Mary
    Rosenthal, Amy
    McField, Melanie
    de Zegher, Joann
    ENVIRONMENTAL RESEARCH LETTERS, 2014, 9 (11):