Towards Efficient DataWrangling with LLMs using Code Generation

被引:0
|
作者
Li, Xue [1 ,2 ]
Dohmen, Till [1 ]
机构
[1] MotherDuck, Amsterdam, Netherlands
[2] Univ Amsterdam, Amsterdam, Netherlands
关键词
EXAMPLE;
D O I
10.1145/3650203.3663334
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
While LLM-based data wrangling approaches that process each row of data have shown promising benchmark results, computational costs still limit their suitability for real-world use cases on large datasets. We revisit code generation using LLMs for various data wrangling tasks, which show promising results particularly for data transformation tasks (up to 37.2 points improvement on F1 score) at much lower computational costs. We furthermore identify shortcomings of code generation methods especially for semantically challenging tasks, and consequently propose an approach that combines program generation with a routing mechanism using LLMs.
引用
收藏
页数:5
相关论文
共 50 条
  • [31] Efficient SIMD Code Generation for Irregular Kernels
    Kim, Seonggun
    Han, Hwansoo
    ACM SIGPLAN NOTICES, 2012, 47 (08) : 55 - 64
  • [32] Efficient code generation from synchronous programs
    Schneider, Klaus
    Brandt, Jens
    Vecchie, Eric
    FOURTH ACM & IEEE INTERNATIONAL CONFERENCE ON FORMAL METHODS AND MODELS FOR CO-DESIGN, PROCEEDINGS, 2006, : 165 - +
  • [33] Enabling efficient stencil code generation in OpenACC
    Pereira, Alyson D.
    Rocha, Rodrigo C. O.
    Castro, Marcio
    Goes, Luis F. W.
    Dantas, Mario A. R.
    INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE (ICCS 2017), 2017, 108 : 2333 - 2337
  • [34] WARio: Efficient Code Generation for Intermittent Computing
    Kortbeek, Vito
    Ghosh, Souradip
    Hester, Josiah
    Campanoni, Simone
    Pawelczak, Przemyslaw
    PROCEEDINGS OF THE 43RD ACM SIGPLAN INTERNATIONAL CONFERENCE ON PROGRAMMING LANGUAGE DESIGN AND IMPLEMENTATION (PLDI '22), 2022, : 777 - 791
  • [35] Development environment for efficient DSP code generation
    Radtke, P
    Erol, A
    ELECTRONIC ENGINEERING, 1999, 71 (870): : 15 - +
  • [36] Efficient automatic code generation for embedded systems
    Pilaud, D
    1995 AVIONICS CONFERENCE AND EXHIBITION - LOW-COST AVIONICS: CAN WE AFFORD IT?, CONFERENCE PROCEEDINGS, 1996, 95 (364): : 351 - 356
  • [37] Efficient code generation for automatic parallelization and optimization
    Bastoul, C
    SECOND INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED COMPUTING, PROCEEDINGS, 2003, : 23 - 30
  • [38] Efficient code generation from SHIM models
    Edwards, Stephen A.
    Tardieu, Olivier
    ACM SIGPLAN NOTICES, 2006, 41 (07) : 125 - 134
  • [39] THE EQUATIONAL SPECIFICATION OF EFFICIENT COMPILER CODE GENERATION
    HATCHER, PJ
    COMPUTER LANGUAGES, 1991, 16 (01): : 81 - 95
  • [40] Efficient code generation for a domain specific language
    Moss, A
    Muller, H
    GENERATIVE PROGRAMMING AND COMPONENT ENGINEERING, PROCEEDINGS, 2005, 3676 : 47 - 62