Towards Efficient DataWrangling with LLMs using Code Generation

被引：0

作者：

Li, Xue ^{[1
,2
]}

Dohmen, Till ^{[1
]}

机构：

[1] MotherDuck, Amsterdam, Netherlands

[2] Univ Amsterdam, Amsterdam, Netherlands

来源：

PROCEEDINGS OF THE 8TH WORKSHOP ON DATA MANAGEMENT FOR END-TO-END MACHINE LEARNING, DEEM 2024 | 2024年

关键词：

EXAMPLE;

D O I：

10.1145/3650203.3663334

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

While LLM-based data wrangling approaches that process each row of data have shown promising benchmark results, computational costs still limit their suitability for real-world use cases on large datasets. We revisit code generation using LLMs for various data wrangling tasks, which show promising results particularly for data transformation tasks (up to 37.2 points improvement on F1 score) at much lower computational costs. We furthermore identify shortcomings of code generation methods especially for semantically challenging tasks, and consequently propose an approach that combines program generation with a routing mechanism using LLMs.

引用

页数：5

共 50 条

[41] Towards Energy Optimization using Trajectory Smoothing and Automatic Code Generation for Robotic Assembly
Gleeson, Daniel
Bjorkenstam, Staffan
Bohlin, Robert
Carlson, Johan S.
Lennartson, Bengt
6TH CIRP CONFERENCE ON ASSEMBLY TECHNOLOGIES AND SYSTEMS (CATS), 2016, 44 : 341 - 346
[42] USING CLOSURES FOR CODE GENERATION
FEELEY, M
LAPALME, G
COMPUTER LANGUAGES, 1987, 12 (01): : 47 - 66
[43] An efficient code generation algorithm for code size reduction using 1-offset P-code queue computation model
Canedo, Arquimedes
Abderazek, Ben A.
Sowa, Masahiro
EMBEDDED AND UBIQUITOUS COMPUTING, PROCEEDINGS, 2007, 4808 : 196 - 208
[44] AutoBench: Automatic Testbench Generation and Evaluation Using LLMs for HDL Design
Qiu, Ruidi
Li Zhang, Grace
Drechsler, Rolf
Schlichtmann, Ulf
Li, Bing
PROCEEDINGS OF THE 2024 ACM/IEEE INTERNATIONAL SYMPOSIUM ON MACHINE LEARNING FOR CAD, MLCAD 2024, 2024,
[45] A New Method Using LLMs for Keypoints Generation in Qualitative Data Analysis
Zhao, Fengxiang
Yu, Fan
Trull, Timothy
Shang, Yi
2023 IEEE CONFERENCE ON ARTIFICIAL INTELLIGENCE, CAI, 2023, : 333 - 334
[46] Method-Level Bug Severity Prediction using Source Code Metrics and LLMs
Mashhadi, Ehsan
Ahmadvand, Hossein
Hemmati, Hadi
2023 IEEE 34TH INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING, ISSRE, 2023, : 635 - 646
[47] Towards Generation of Visual Attention Map for Source Code
Itoh, Takeshi D.
Kubo, Takatomi
Ikeda, Kiyoka
Maruno, Yuki
Ikutani, Yoshiharu
Hata, Hideaki
Matsumoto, Kenichi
Ikeda, Kazushi
2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 951 - 954
[48] Towards Automated Code Generation for Autonomous Mobile Robots
Kerr, D.
Nehmzow, U.
Billings, S. A.
ARTRIFICIAL GENERAL INTELLIGENCE, AGI 2010, 2010, 10 : 55 - 60
[49] Enhancing UML expressivity towards automatic code generation
Pais, APV
Oliveira, CET
OOIS 2001: 7TH INTERNATIONAL CONFERENCE ON OBJECT-ORIENTED INFORMATION SYSTEMS, PROCEEDINGS, 2001, : 335 - 344
[50] Towards Context-Aware Code Comment Generation
Yu, Xiaohan
Huang, Quzhe
Wang, Zheng
Feng, Yansong
Zhao, Dongyan
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 3938 - 3947

← 1 2 3 4 5 →