LLMs for science: Usage for code generation and data analysis

被引：3

作者：

Nejjar, Mohamed ^{[1
]}

Zacharias, Luca ^{[1
]}

Stiehle, Fabian ^{[1
]}

Weber, Ingo ^{[1
,2
]}

机构：

[1] Tech Univ Munich, Sch Computat Informat & Technol, Munich, Germany

[2] Fraunhofer Gesell, Munich, Germany

来源：

JOURNAL OF SOFTWARE-EVOLUTION AND PROCESS | 2025年 / 37卷 / 01期

关键词：

artificial intelligence; code generation; data analysis; GenAI4Science; large language models; LLMs4Science; research methods;

D O I：

10.1002/smr.2723

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Large language models (LLMs) have been touted to enable increased productivity in many areas of today's work life. Scientific research as an area of work is no exception: The potential of LLM-based tools to assist in the daily work of scientists has become a highly discussed topic across disciplines. However, we are only at the very onset of this subject of study. It is still unclear how the potential of LLMs will materialize in research practice. With this study, we give first empirical evidence on the use of LLMs in the research process. We have investigated a set of use cases for LLM-based tools in scientific research and conducted a first study to assess to which degree current tools are helpful. In this position paper, we report specifically on use cases related to software engineering, specifically, on generating application code and developing scripts for data analytics and visualization. While we studied seemingly simple use cases, results across tools differ significantly. Our results highlight the promise of LLM-based tools in general, yet we also observe various issues, particularly regarding the integrity of the output these tools provide.

引用

页数：7

共 50 条

[31] Usage of Visualization Techniques in Data Science Workflows
Schmidt, Johanna
IVAPP: PROCEEDINGS OF THE 15TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS, VOL 3: IVAPP, 2020, : 309 - 316
[32] Choice data generation using usage scenarios and discounted cash flow analysis
Lee, Ungki
Kang, Namwoo
Lee, Ikjin
JOURNAL OF CHOICE MODELLING, 2020, 37
[33] Open data and open code for big science of science studies
Light, Robert P.
Polley, David E.
Boerner, Katy
SCIENTOMETRICS, 2014, 101 (02) : 1535 - 1551
[34] OPEN DATA AND OPEN CODE FOR BIG SCIENCE OF SCIENCE STUDIES
Light, Robert P.
Polley, David E.
Boerner, Katy
14TH INTERNATIONAL SOCIETY OF SCIENTOMETRICS AND INFORMETRICS CONFERENCE (ISSI), 2013, : 1342 - 1356
[35] Open data and open code for big science of science studies
Robert P. Light
David E. Polley
Katy Börner
Scientometrics, 2014, 101 : 1535 - 1551
[36] Constructing a usage model for statistical testing with source code generation methods
Takagi, T
Furukawa, Z
11TH ASIA-PACIFIC SOFTWARE ENGINEERING CONFERENCE, PROCEEDINGS, 2004, : 448 - 454
[37] CONTROLLING TRANSFORMATION DATA-CODE-DATA REPRESENTED BY A NOVEL MATRIX USAGE
VASJUKEVICH, V
AVTOMATIKA I VYCHISLITELNAYA TEKHNIKA, 1994, (01): : 44 - 52
[38] A data mining approach to characterizing medical code usage patterns
Spangler W.E.
May J.H.
Strum D.P.
Vargas L.G.
Journal of Medical Systems, 2002, 26 (3) : 255 - 275
[39] DATA GENERATION IN SCIENCE AND TECHNOLOGY
GOPINATH, MA
LIBRARY SCIENCE WITH A SLANT TO DOCUMENTATION, 1981, 18 (01): : 1 - 12
[40] Turning Low-Code Development Platforms into True No-Code with LLMs
Hagel, Nathan
Hili, Nicolas
Schwab, Didier
ACM/IEEE 27TH INTERNATIONAL CONFERENCE ON MODEL DRIVEN ENGINEERING LANGUAGES AND SYSTEMS: COMPANION PROCEEDINGS, MODELS 2024, 2024, : 876 - 885

← 1 2 3 4 5 →