Towards Efficient DataWrangling with LLMs using Code Generation

被引：0

作者：

Li, Xue ^{[1
,2
]}

Dohmen, Till ^{[1
]}

机构：

[1] MotherDuck, Amsterdam, Netherlands

[2] Univ Amsterdam, Amsterdam, Netherlands

来源：

PROCEEDINGS OF THE 8TH WORKSHOP ON DATA MANAGEMENT FOR END-TO-END MACHINE LEARNING, DEEM 2024 | 2024年

关键词：

EXAMPLE;

D O I：

10.1145/3650203.3663334

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

While LLM-based data wrangling approaches that process each row of data have shown promising benchmark results, computational costs still limit their suitability for real-world use cases on large datasets. We revisit code generation using LLMs for various data wrangling tasks, which show promising results particularly for data transformation tasks (up to 37.2 points improvement on F1 score) at much lower computational costs. We furthermore identify shortcomings of code generation methods especially for semantically challenging tasks, and consequently propose an approach that combines program generation with a routing mechanism using LLMs.

引用

页数：5

共 50 条

[1] When to Stop? Towards Efficient Code Generation in LLMs with Excess Token Prevention
Guo, Lianghong
Wang, Yanlin
Shi, Ensheng
Zhong, Wanjun
Zhang, Hongyu
Chen, Jiachi
Zhang, Ruikai
Ma, Yuchi
Zheng, Zibin
PROCEEDINGS OF THE 33RD ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON SOFTWARE TESTING AND ANALYSIS, ISSTA 2024, 2024, : 1073 - 1085
[2] An Empirical Study of the Code Generation of Safety-Critical Software Using LLMs
Liu, Mingxing
Wang, Junfeng
Lin, Tao
Ma, Quan
Fang, Zhiyang
Wu, Yanqun
APPLIED SCIENCES-BASEL, 2024, 14 (03):
[3] LLMs for science: Usage for code generation and data analysis
Nejjar, Mohamed
Zacharias, Luca
Stiehle, Fabian
Weber, Ingo
JOURNAL OF SOFTWARE-EVOLUTION AND PROCESS, 2025, 37 (01)
[4] Chain-of-Descriptions: Improving Code LLMs for VHDL Code Generation and Summarization
Vijayaraghavan, Prashanth
Nitsure, Apoorva
Mackin, Charles
Shi, Luyao
Ambrogio, Stefano
Haran, Arvind
Paruthi, Viresh
Elzein, Ali
Coops, Dan
Beymer, David
Baldwin, Tyler
Degan, Ehsan
PROCEEDINGS OF THE 2024 ACM/IEEE INTERNATIONAL SYMPOSIUM ON MACHINE LEARNING FOR CAD, MLCAD 2024, 2024,
[5] Towards an Intelligent Test Case Generation Framework Using LLMs and Prompt Engineering
Boukhlif, Mohamed
Kharmoum, Nassim
Hanine, Mohamed
Kodad, Mohcine
Lagmiri, Souad Najoua
ADVANCES IN SMART MEDICAL, IOT & ARTIFICIAL INTELLIGENCE, VOL 2, ICSMAI 2024, 2024, 12 : 24 - 31
[6] Code Summarization without Direct Access to Code - Towards Exploring Federated LLMs for Software Engineering
Kumar, Jahnavi
Chimalakonda, Sridhar
PROCEEDINGS OF 2024 28TH INTERNATION CONFERENCE ON EVALUATION AND ASSESSMENT IN SOFTWARE ENGINEERING, EASE 2024, 2024, : 100 - 109
[7] Automated Assessment of Students' Code Comprehension using LLMs
Oli, Priti
Banjade, Rabin
Chapagain, Jeevan
Rus, Vasile
AI FOR EDUCATION WORKSHOP, 2024, 257 : 118 - 128
[8] Identifying Gaps in Students' Explanations of Code Using LLMs
Banjade, Rabin
Oli, Priti
Sajib, Mahmudul Islam
Rus, Vasile
ARTIFICIAL INTELLIGENCE IN EDUCATION, PT II, AIED 2024, 2024, 14830 : 268 - 275
[9] Evaluating LLMs for Code Generation in HRI: A Comparative Study of ChatGPT, Gemini, and Claude
Sobo, Andrei
Mubarak, Awes
Baimagambetov, Almas
Polatidis, Nikolaos
APPLIED ARTIFICIAL INTELLIGENCE, 2025, 39 (01)
[10] Code Confabulator: Harnessing LLMs to Compile Code for Visualization
Amrita School of Computing, Amrita Vishwa Vidyapeetham, Department of Computer Science and Engineering, Bengaluru, India
Int. Conf. Comput. Commun. Netw. Technol., ICCCNT,

← 1 2 3 4 5 →