MatPlotAgent: Method and Evaluation for LLM-Based Agentic Scientific Data Visualization

被引：0

作者：

Yang, Zhiyu ^{[4
]}

Zhou, Zihan ^{[5
]}

Wang, Shuo ^{[1
]}

ConG, Xin ^{[1
,2
,3
]}

Han, Xu ^{[1
,2
,3
]}

Yan, Yukun ^{[1
]}

Liu, Zhenghao ^{[6
]}

Tan, Zhixing ^{[7
]}

Liu, Pengyuan ^{[4
]}

Yu, Dong ^{[4
]}

Liu, Zhiyuan ^{[1
,2
,3
]}

Shi, Xiaodong ^{[5
]}

Sun, Maosong ^{[1
,2
,3
]}

机构：

[1] Tsinghua Univ, Dept Comp Sci & Tech, Beijing, Peoples R China

[2] Tsinghua Univ, Inst AI, Beijing, Peoples R China

[3] Beijing Natl Res Ctr Informat Sci & Technol, Beijing, Peoples R China

[4] Beijing Language & Culture Univ, Beijing, Peoples R China

[5] Xiamen Univ, Xiamen, Peoples R China

[6] Northeastern Univ, Shenyang, Peoples R China

[7] Zhongguancun Lab, Beijing, Peoples R China

来源：

FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024 | 2024年

基金：

国家重点研发计划; 中国国家自然科学基金;

关键词：

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Scientific data visualization plays a crucial role in research by enabling the direct display of complex information and assisting researchers in identifying implicit patterns. Despite its importance, the use of Large Language Models (LLMs) for scientific data visualization remains rather unexplored. In this study, we introduce MatPlotAgent, an efficient modelagnostic LLM agent framework designed to automate scientific data visualization tasks. Leveraging the capabilities of both code LLMs and multi-modal LLMs, MatPlotAgent consists of three core modules: query understanding, code generation with iterative debugging, and a visual feedback mechanism for error correction. To address the lack of benchmarks in this field, we present MatPlotBench, a high-quality benchmark consisting of 100 human-verified test cases. Additionally, we introduce a scoring approach that utilizes GPT-4V for automatic evaluation. Experimental results demonstrate that MatPlotAgent can improve the performance of various LLMs, including both commercial and open-source models. Furthermore, the proposed evaluation method shows a strong correlation with human-annotated scores.

引用

页码：11789 / 11804

页数：16

共 50 条

[1] LLM-based agentic systems in medicine and healthcare
Qiu, Jianing
Lam, Kyle
Li, Guohao
Acharya, Amish
Wong, Tien Yin
Darzi, Ara
Yuan, Wu
Topol, Eric J.
NATURE MACHINE INTELLIGENCE, 2024, 6 (12) : 1418 - 1420
[2] LLM-based Vulnerability Sourcing from Unstructured Data
Ashiwal, Virendra
Finster, Soeren
Dawoud, Abdallah
9TH IEEE EUROPEAN SYMPOSIUM ON SECURITY AND PRIVACY WORKSHOPS, EUROS&PW 2024, 2024, : 634 - 641
[3] An LLM-based Knowledge Synthesis and Scientific Reasoning Framework for Biomedical Discovery
Wysocki, Oskar
Wysocka, Magdalena
Carvalho, Danilo S.
Bogatu, Alex
Gusicuma, Danilo
Delmas, Maxime
Unsworth, Harriet
Freitas, Andre
PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 3: SYSTEM DEMONSTRATIONS, 2024, : 355 - 364
[4] Challenges and Opportunities of LLM-Based Synthetic Personae and Data in HCI
Prpa, Mirjana
Troiano, Giovanni
Wood, Matthew
Coady, Yvonne
EXTENDED ABSTRACTS OF THE 2024 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, CHI 2024, 2024,
[5] A Quantitative and Qualitative Evaluation of LLM-Based Explainable Fault Localization
Kang, Sungmin
An, Gabin
Yoo, Shin
arXiv, 2023,
[6] LLM-Based Code Generation Method for Golang Compiler Testing
Gu, Qiuhan
PROCEEDINGS OF THE 31ST ACM JOINT MEETING EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, ESEC/FSE 2023, 2023, : 2201 - 2203
[7] Evaluation of LLM-based chatbots for OSINT-based Cyber Threat Awareness
Shafee, Samaneh
Bessani, Alysson
Ferreira, Pedro M.
EXPERT SYSTEMS WITH APPLICATIONS, 2025, 261
[8] Balancing Efficiency and Quality in LLM-Based Entity Resolution on Structured Data
Nananukul, Navapat
Kekriwal, Mayank
SOCIAL NETWORKS ANALYSIS AND MINING, ASONAM 2024, PT III, 2025, 15213 : 278 - 293
[9] Data-efficient Fine-tuning for LLM-based Recommendation
Lin, Xinyu
Wang, Wenjie
Li, Yongqi
Yang, Shuo
Feng, Fuli
Wei, Yinwei
Chua, Tat-Seng
PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 365 - 374
[10] Mobile-Bench: An Evaluation Benchmark for LLM-based Mobile Agents
Deng, Shihan
Xu, Weikai
Sun, Hongda
Liu, Wei
Tang, Tao
Liu, Jianfeng
Li, Ang
Luan, Jian
Wang, Bin
Yan, Rui
Shang, Shuo
PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 8813 - 8831

← 1 2 3 4 5 →