CLOVA: A Closed-LOop Visual Assistant with Tool Usage and Update

被引:1
|
作者
Gao, Zhi [1 ,2 ]
Du, Yuntao [2 ]
Zhang, Xintong [2 ,3 ]
Ma, Xiaojian [2 ]
Han, Wenjuan [3 ]
Zhu, Song-Chun [1 ,2 ,4 ]
Li, Qing [2 ]
机构
[1] Peking Univ, Sch Intelligence Sci & Technol, Beijing, Peoples R China
[2] BIGAI, State Key Lab Gen Artificial Intelligence, Beijing, Peoples R China
[3] Beijing Jiaotong Univ, Beijing, Peoples R China
[4] Tsinghua Univ, Dept Automat, Beijing, Peoples R China
关键词
D O I
10.1109/CVPR52733.2024.01259
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Utilizing large language models (LLMs) to compose off-the-shelf visual tools represents a promising avenue of research for developing robust visual assistants capable of addressing diverse visual tasks. However, these methods often overlook the potential for continual learning, typically by freezing the utilized tools, thus limiting their adaptation to environments requiring new knowledge. To tackle this challenge, we propose CLOVA, a Closed-LOop Visual Assistant, which operates within a framework encompassing inference, reflection, and learning phases. During the inference phase, LLMs generate programs and execute cor responding tools to complete assigned tasks. In the reflection phase, a multimodal global-local reflection scheme analyzes human feedback to determine which tools require updating. Lastly, the learning phase employs three flexible approaches to automatically gather training data and introduces a novel prompt tuning scheme to update the tools, allowing CLOVA to efficiently acquire new knowledge. Experimental findings demonstrate that CLOVA surpasses existing tool-usage methods by 5% in visual question answering and multiple-image reasoning, by 10% in knowledge tagging, and by 20% in image editing. These results underscore the significance of the continual learning capability in general visual assistants.
引用
收藏
页码:13258 / 13268
页数:11
相关论文
共 50 条
  • [21] Approximate policy iteration for closed-loop learning of visual tasks
    Jodogne, Sebastien
    Briquet, Cyril
    Piater, Justus H.
    MACHINE LEARNING: ECML 2006, PROCEEDINGS, 2006, 4212 : 210 - 221
  • [22] Artificial Visual Electronics for Closed-Loop Sensation/Action Systems
    Jiang, Zhi
    Jiang, Ying
    Chen, Nuan
    Chen, Xiaodong
    ADVANCED INTELLIGENT SYSTEMS, 2021, 3 (09)
  • [23] Closed-loop control of anaesthetic drug delivery as research tool
    Schüttler, J
    Schwilden, H
    STATE-OF-THE-ART TECHNOLOGY IN ANESTHESIA AND INTENSIVE CARE, 1998, 1168 : 113 - 121
  • [24] Closed-loop EEG study on visual recognition during driving
    Aydarkhanov, Ruslan
    Uscumlic, Marija
    Chavarriaga, Ricardo
    Gheorghe, Lucian
    Millan, Jose del R.
    JOURNAL OF NEURAL ENGINEERING, 2021, 18 (02)
  • [25] OpenEyeSim: A biomechanical model for simulation of closed-loop visual perception
    Priamikov, Alexander
    Fronius, Maria
    Shi, Bertram
    Triesch, Jochen
    JOURNAL OF VISION, 2016, 16 (15):
  • [26] Proposal of FCbT Considering Closed-Loop Stability at Each Parameter Update
    Yubai, Kazuhiro
    Fujii, Hiroki
    Hirai, Junji
    ELECTRICAL ENGINEERING IN JAPAN, 2015, 190 (01) : 69 - 78
  • [27] Closed-loop optimization
    Capdevila-Cortada, Marcal
    NATURE CATALYSIS, 2024, 7 (02) : 114 - 114
  • [28] Closed-loop ventilation
    Arnal, Jean-Michel
    Katayama, Shinshu
    Howard, Christopher
    CURRENT OPINION IN CRITICAL CARE, 2023, 29 (01) : 19 - 25
  • [29] CLOSED-LOOP IN SCHOOL
    Bratina, N.
    DIABETES TECHNOLOGY & THERAPEUTICS, 2020, 22 : A12 - A13
  • [30] Closed-loop stability
    VanDoren, Vance
    CONTROL ENGINEERING, 2010, 57 (06) : 64 - 64