LLMs for science: Usage for code generation and data analysis

被引:3
|
作者
Nejjar, Mohamed [1 ]
Zacharias, Luca [1 ]
Stiehle, Fabian [1 ]
Weber, Ingo [1 ,2 ]
机构
[1] Tech Univ Munich, Sch Computat Informat & Technol, Munich, Germany
[2] Fraunhofer Gesell, Munich, Germany
关键词
artificial intelligence; code generation; data analysis; GenAI4Science; large language models; LLMs4Science; research methods;
D O I
10.1002/smr.2723
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Large language models (LLMs) have been touted to enable increased productivity in many areas of today's work life. Scientific research as an area of work is no exception: The potential of LLM-based tools to assist in the daily work of scientists has become a highly discussed topic across disciplines. However, we are only at the very onset of this subject of study. It is still unclear how the potential of LLMs will materialize in research practice. With this study, we give first empirical evidence on the use of LLMs in the research process. We have investigated a set of use cases for LLM-based tools in scientific research and conducted a first study to assess to which degree current tools are helpful. In this position paper, we report specifically on use cases related to software engineering, specifically, on generating application code and developing scripts for data analytics and visualization. While we studied seemingly simple use cases, results across tools differ significantly. Our results highlight the promise of LLM-based tools in general, yet we also observe various issues, particularly regarding the integrity of the output these tools provide.
引用
收藏
页数:7
相关论文
共 50 条
  • [31] Usage of Visualization Techniques in Data Science Workflows
    Schmidt, Johanna
    IVAPP: PROCEEDINGS OF THE 15TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS, VOL 3: IVAPP, 2020, : 309 - 316
  • [32] Choice data generation using usage scenarios and discounted cash flow analysis
    Lee, Ungki
    Kang, Namwoo
    Lee, Ikjin
    JOURNAL OF CHOICE MODELLING, 2020, 37
  • [33] Open data and open code for big science of science studies
    Light, Robert P.
    Polley, David E.
    Boerner, Katy
    SCIENTOMETRICS, 2014, 101 (02) : 1535 - 1551
  • [34] OPEN DATA AND OPEN CODE FOR BIG SCIENCE OF SCIENCE STUDIES
    Light, Robert P.
    Polley, David E.
    Boerner, Katy
    14TH INTERNATIONAL SOCIETY OF SCIENTOMETRICS AND INFORMETRICS CONFERENCE (ISSI), 2013, : 1342 - 1356
  • [35] Open data and open code for big science of science studies
    Robert P. Light
    David E. Polley
    Katy Börner
    Scientometrics, 2014, 101 : 1535 - 1551
  • [36] Constructing a usage model for statistical testing with source code generation methods
    Takagi, T
    Furukawa, Z
    11TH ASIA-PACIFIC SOFTWARE ENGINEERING CONFERENCE, PROCEEDINGS, 2004, : 448 - 454
  • [37] CONTROLLING TRANSFORMATION DATA-CODE-DATA REPRESENTED BY A NOVEL MATRIX USAGE
    VASJUKEVICH, V
    AVTOMATIKA I VYCHISLITELNAYA TEKHNIKA, 1994, (01): : 44 - 52
  • [38] A data mining approach to characterizing medical code usage patterns
    Spangler W.E.
    May J.H.
    Strum D.P.
    Vargas L.G.
    Journal of Medical Systems, 2002, 26 (3) : 255 - 275
  • [39] DATA GENERATION IN SCIENCE AND TECHNOLOGY
    GOPINATH, MA
    LIBRARY SCIENCE WITH A SLANT TO DOCUMENTATION, 1981, 18 (01): : 1 - 12
  • [40] Turning Low-Code Development Platforms into True No-Code with LLMs
    Hagel, Nathan
    Hili, Nicolas
    Schwab, Didier
    ACM/IEEE 27TH INTERNATIONAL CONFERENCE ON MODEL DRIVEN ENGINEERING LANGUAGES AND SYSTEMS: COMPANION PROCEEDINGS, MODELS 2024, 2024, : 876 - 885