Towards Efficient DataWrangling with LLMs using Code Generation

被引:0
|
作者
Li, Xue [1 ,2 ]
Dohmen, Till [1 ]
机构
[1] MotherDuck, Amsterdam, Netherlands
[2] Univ Amsterdam, Amsterdam, Netherlands
关键词
EXAMPLE;
D O I
10.1145/3650203.3663334
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
While LLM-based data wrangling approaches that process each row of data have shown promising benchmark results, computational costs still limit their suitability for real-world use cases on large datasets. We revisit code generation using LLMs for various data wrangling tasks, which show promising results particularly for data transformation tasks (up to 37.2 points improvement on F1 score) at much lower computational costs. We furthermore identify shortcomings of code generation methods especially for semantically challenging tasks, and consequently propose an approach that combines program generation with a routing mechanism using LLMs.
引用
收藏
页数:5
相关论文
共 50 条
  • [21] Towards algorithmic framing analysis: expanding the scope by using LLMs
    Kuang, Xianwen
    Liu, Jun
    Zhang, Haiyang
    Schweighofer, Simon
    JOURNAL OF BIG DATA, 2025, 12 (01)
  • [22] Code Needs Comments: Enhancing Code LLMs with Comment Augmentation
    Song, Demin
    Guo, Honglin
    Zhou, Yunhua
    Xing, Shuhao
    Wang, Yudong
    Song, Zifan
    Zhang, Wenwei
    Guo, Qipeng
    Yan, Hang
    Qiu, Xipeng
    Lin, Dahua
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 13640 - 13656
  • [23] Al Coders Are among Us: Rethinking Programming Language Grammar towards Efficient Code Generation
    Sun, Zhensu
    Du, Xiaoning
    Yang, Zhou
    Li, Li
    Lo, David
    PROCEEDINGS OF THE 33RD ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON SOFTWARE TESTING AND ANALYSIS, ISSTA 2024, 2024, : 1124 - 1136
  • [24] Towards Automatic Code Generation for UAV Mission Planning using Decision Sensors
    Martins, Ricardo F.
    de Almeida, Gian L. N.
    Leal, Andre B.
    2017 INTERNATIONAL CONFERENCE ON UNMANNED AIRCRAFT SYSTEMS (ICUAS'17), 2017, : 682 - 689
  • [25] PromSec: Prompt Optimization for Secure Generation of Functional Source Code with Large Language Models (LLMs)
    Nazzal, Mahmoud
    Khalil, Issa
    Khreishah, Abdallah
    Phan, NhatHai
    CCS 2024 - Proceedings of the 2024 ACM SIGSAC Conference on Computer and Communications Security, : 2266 - 2279
  • [26] Towards generation of efficient transformations
    Vizhanyo, A
    Agrawal, A
    Shi, F
    GENERATIVE PROGRAMMING AND COMPONENT ENGINEERING 2004, PROCEEDINGS, 2004, 3286 : 298 - 316
  • [27] ZeroLeak: Automated Side-Channel Patching in Source Code Using LLMs
    Tol, M. Caner
    Sunar, Berk
    COMPUTER SECURITY-ESORICS 2024, PT I, 2024, 14982 : 290 - 310
  • [28] On Evaluating the Efficiency of Source Code Generated by LLMs
    Niu, Changan
    Zhang, Ting
    Li, Chuanyi
    Luo, Bin
    Ng, Vincent
    PROCEEDINGS 2024 IEEE/ACM FIRST INTERNATIONAL CONFERENCE ON AI FOUNDATION MODELS AND SOFTWARE ENGINEERING, FORGE 2024, 2024, : 103 - 107
  • [29] Qiskit Code Assistant: Training LLMs for generating Quantum Computing Code
    Dupuis, Nicolas
    Buratti, Luca
    Vishwakarma, Sanjay
    Forrat, Aitana Viudes
    Kremer, David
    Faro, Ismael
    Puri, Ruchir
    Cruz-Benito, Juan
    2024 IEEE LLM AIDED DESIGN WORKSHOP, LAD 2024, 2024,
  • [30] Efficient automatic code generation for embedded systems
    Pilaud, D
    MICROPROCESSORS AND MICROSYSTEMS, 1997, 20 (08) : 501 - 504