Towards Efficient DataWrangling with LLMs using Code Generation

被引:0
|
作者
Li, Xue [1 ,2 ]
Dohmen, Till [1 ]
机构
[1] MotherDuck, Amsterdam, Netherlands
[2] Univ Amsterdam, Amsterdam, Netherlands
关键词
EXAMPLE;
D O I
10.1145/3650203.3663334
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
While LLM-based data wrangling approaches that process each row of data have shown promising benchmark results, computational costs still limit their suitability for real-world use cases on large datasets. We revisit code generation using LLMs for various data wrangling tasks, which show promising results particularly for data transformation tasks (up to 37.2 points improvement on F1 score) at much lower computational costs. We furthermore identify shortcomings of code generation methods especially for semantically challenging tasks, and consequently propose an approach that combines program generation with a routing mechanism using LLMs.
引用
收藏
页数:5
相关论文
共 50 条
  • [1] When to Stop? Towards Efficient Code Generation in LLMs with Excess Token Prevention
    Guo, Lianghong
    Wang, Yanlin
    Shi, Ensheng
    Zhong, Wanjun
    Zhang, Hongyu
    Chen, Jiachi
    Zhang, Ruikai
    Ma, Yuchi
    Zheng, Zibin
    PROCEEDINGS OF THE 33RD ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON SOFTWARE TESTING AND ANALYSIS, ISSTA 2024, 2024, : 1073 - 1085
  • [2] An Empirical Study of the Code Generation of Safety-Critical Software Using LLMs
    Liu, Mingxing
    Wang, Junfeng
    Lin, Tao
    Ma, Quan
    Fang, Zhiyang
    Wu, Yanqun
    APPLIED SCIENCES-BASEL, 2024, 14 (03):
  • [3] LLMs for science: Usage for code generation and data analysis
    Nejjar, Mohamed
    Zacharias, Luca
    Stiehle, Fabian
    Weber, Ingo
    JOURNAL OF SOFTWARE-EVOLUTION AND PROCESS, 2025, 37 (01)
  • [4] Chain-of-Descriptions: Improving Code LLMs for VHDL Code Generation and Summarization
    Vijayaraghavan, Prashanth
    Nitsure, Apoorva
    Mackin, Charles
    Shi, Luyao
    Ambrogio, Stefano
    Haran, Arvind
    Paruthi, Viresh
    Elzein, Ali
    Coops, Dan
    Beymer, David
    Baldwin, Tyler
    Degan, Ehsan
    PROCEEDINGS OF THE 2024 ACM/IEEE INTERNATIONAL SYMPOSIUM ON MACHINE LEARNING FOR CAD, MLCAD 2024, 2024,
  • [5] Towards an Intelligent Test Case Generation Framework Using LLMs and Prompt Engineering
    Boukhlif, Mohamed
    Kharmoum, Nassim
    Hanine, Mohamed
    Kodad, Mohcine
    Lagmiri, Souad Najoua
    ADVANCES IN SMART MEDICAL, IOT & ARTIFICIAL INTELLIGENCE, VOL 2, ICSMAI 2024, 2024, 12 : 24 - 31
  • [6] Code Summarization without Direct Access to Code - Towards Exploring Federated LLMs for Software Engineering
    Kumar, Jahnavi
    Chimalakonda, Sridhar
    PROCEEDINGS OF 2024 28TH INTERNATION CONFERENCE ON EVALUATION AND ASSESSMENT IN SOFTWARE ENGINEERING, EASE 2024, 2024, : 100 - 109
  • [7] Automated Assessment of Students' Code Comprehension using LLMs
    Oli, Priti
    Banjade, Rabin
    Chapagain, Jeevan
    Rus, Vasile
    AI FOR EDUCATION WORKSHOP, 2024, 257 : 118 - 128
  • [8] Identifying Gaps in Students' Explanations of Code Using LLMs
    Banjade, Rabin
    Oli, Priti
    Sajib, Mahmudul Islam
    Rus, Vasile
    ARTIFICIAL INTELLIGENCE IN EDUCATION, PT II, AIED 2024, 2024, 14830 : 268 - 275
  • [9] Evaluating LLMs for Code Generation in HRI: A Comparative Study of ChatGPT, Gemini, and Claude
    Sobo, Andrei
    Mubarak, Awes
    Baimagambetov, Almas
    Polatidis, Nikolaos
    APPLIED ARTIFICIAL INTELLIGENCE, 2025, 39 (01)
  • [10] Code Confabulator: Harnessing LLMs to Compile Code for Visualization
    Amrita School of Computing, Amrita Vishwa Vidyapeetham, Department of Computer Science and Engineering, Bengaluru, India
    Int. Conf. Comput. Commun. Netw. Technol., ICCCNT,