Towards Efficient DataWrangling with LLMs using Code Generation

被引:0
|
作者
Li, Xue [1 ,2 ]
Dohmen, Till [1 ]
机构
[1] MotherDuck, Amsterdam, Netherlands
[2] Univ Amsterdam, Amsterdam, Netherlands
关键词
EXAMPLE;
D O I
10.1145/3650203.3663334
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
While LLM-based data wrangling approaches that process each row of data have shown promising benchmark results, computational costs still limit their suitability for real-world use cases on large datasets. We revisit code generation using LLMs for various data wrangling tasks, which show promising results particularly for data transformation tasks (up to 37.2 points improvement on F1 score) at much lower computational costs. We furthermore identify shortcomings of code generation methods especially for semantically challenging tasks, and consequently propose an approach that combines program generation with a routing mechanism using LLMs.
引用
收藏
页数:5
相关论文
共 50 条
  • [41] Towards Energy Optimization using Trajectory Smoothing and Automatic Code Generation for Robotic Assembly
    Gleeson, Daniel
    Bjorkenstam, Staffan
    Bohlin, Robert
    Carlson, Johan S.
    Lennartson, Bengt
    6TH CIRP CONFERENCE ON ASSEMBLY TECHNOLOGIES AND SYSTEMS (CATS), 2016, 44 : 341 - 346
  • [42] USING CLOSURES FOR CODE GENERATION
    FEELEY, M
    LAPALME, G
    COMPUTER LANGUAGES, 1987, 12 (01): : 47 - 66
  • [43] An efficient code generation algorithm for code size reduction using 1-offset P-code queue computation model
    Canedo, Arquimedes
    Abderazek, Ben A.
    Sowa, Masahiro
    EMBEDDED AND UBIQUITOUS COMPUTING, PROCEEDINGS, 2007, 4808 : 196 - 208
  • [44] AutoBench: Automatic Testbench Generation and Evaluation Using LLMs for HDL Design
    Qiu, Ruidi
    Li Zhang, Grace
    Drechsler, Rolf
    Schlichtmann, Ulf
    Li, Bing
    PROCEEDINGS OF THE 2024 ACM/IEEE INTERNATIONAL SYMPOSIUM ON MACHINE LEARNING FOR CAD, MLCAD 2024, 2024,
  • [45] A New Method Using LLMs for Keypoints Generation in Qualitative Data Analysis
    Zhao, Fengxiang
    Yu, Fan
    Trull, Timothy
    Shang, Yi
    2023 IEEE CONFERENCE ON ARTIFICIAL INTELLIGENCE, CAI, 2023, : 333 - 334
  • [46] Method-Level Bug Severity Prediction using Source Code Metrics and LLMs
    Mashhadi, Ehsan
    Ahmadvand, Hossein
    Hemmati, Hadi
    2023 IEEE 34TH INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING, ISSRE, 2023, : 635 - 646
  • [47] Towards Generation of Visual Attention Map for Source Code
    Itoh, Takeshi D.
    Kubo, Takatomi
    Ikeda, Kiyoka
    Maruno, Yuki
    Ikutani, Yoshiharu
    Hata, Hideaki
    Matsumoto, Kenichi
    Ikeda, Kazushi
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 951 - 954
  • [48] Towards Automated Code Generation for Autonomous Mobile Robots
    Kerr, D.
    Nehmzow, U.
    Billings, S. A.
    ARTRIFICIAL GENERAL INTELLIGENCE, AGI 2010, 2010, 10 : 55 - 60
  • [49] Enhancing UML expressivity towards automatic code generation
    Pais, APV
    Oliveira, CET
    OOIS 2001: 7TH INTERNATIONAL CONFERENCE ON OBJECT-ORIENTED INFORMATION SYSTEMS, PROCEEDINGS, 2001, : 335 - 344
  • [50] Towards Context-Aware Code Comment Generation
    Yu, Xiaohan
    Huang, Quzhe
    Wang, Zheng
    Feng, Yansong
    Zhao, Dongyan
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 3938 - 3947