LLMs for science: Usage for code generation and data analysis

被引：3

作者：

Nejjar, Mohamed ^{[1
]}

Zacharias, Luca ^{[1
]}

Stiehle, Fabian ^{[1
]}

Weber, Ingo ^{[1
,2
]}

机构：

[1] Tech Univ Munich, Sch Computat Informat & Technol, Munich, Germany

[2] Fraunhofer Gesell, Munich, Germany

来源：

JOURNAL OF SOFTWARE-EVOLUTION AND PROCESS | 2025年 / 37卷 / 01期

关键词：

artificial intelligence; code generation; data analysis; GenAI4Science; large language models; LLMs4Science; research methods;

D O I：

10.1002/smr.2723

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Large language models (LLMs) have been touted to enable increased productivity in many areas of today's work life. Scientific research as an area of work is no exception: The potential of LLM-based tools to assist in the daily work of scientists has become a highly discussed topic across disciplines. However, we are only at the very onset of this subject of study. It is still unclear how the potential of LLMs will materialize in research practice. With this study, we give first empirical evidence on the use of LLMs in the research process. We have investigated a set of use cases for LLM-based tools in scientific research and conducted a first study to assess to which degree current tools are helpful. In this position paper, we report specifically on use cases related to software engineering, specifically, on generating application code and developing scripts for data analytics and visualization. While we studied seemingly simple use cases, results across tools differ significantly. Our results highlight the promise of LLM-based tools in general, yet we also observe various issues, particularly regarding the integrity of the output these tools provide.

引用

页数：7

共 50 条

[1] LLMs: Understanding Code Syntax and Semantics for Code Analysis
Ma, Wei
Wang, Wenhan
Liu, Ye
Liu, Shangqing
Hu, Qiang
Li, Li
Liu, Yang
Lin, Zhihao
Zhang, Cen
Nie, Liming
arXiv, 2023,
[2] Towards Efficient DataWrangling with LLMs using Code Generation
Li, Xue
Dohmen, Till
PROCEEDINGS OF THE 8TH WORKSHOP ON DATA MANAGEMENT FOR END-TO-END MACHINE LEARNING, DEEM 2024, 2024,
[3] Chain-of-Descriptions: Improving Code LLMs for VHDL Code Generation and Summarization
Vijayaraghavan, Prashanth
Nitsure, Apoorva
Mackin, Charles
Shi, Luyao
Ambrogio, Stefano
Haran, Arvind
Paruthi, Viresh
Elzein, Ali
Coops, Dan
Beymer, David
Baldwin, Tyler
Degan, Ehsan
PROCEEDINGS OF THE 2024 ACM/IEEE INTERNATIONAL SYMPOSIUM ON MACHINE LEARNING FOR CAD, MLCAD 2024, 2024,
[4] A New Method Using LLMs for Keypoints Generation in Qualitative Data Analysis
Zhao, Fengxiang
Yu, Fan
Trull, Timothy
Shang, Yi
2023 IEEE CONFERENCE ON ARTIFICIAL INTELLIGENCE, CAI, 2023, : 333 - 334
[5] Natural Language to Code Generation in Interactive Data Science Notebooks
Yin, Pengcheng
Li, Wen-Ding
Xiao, Kefan
Rao, Abhishek
Wen, Yeming
Shi, Kensen
Howland, Joshua
Bailey, Paige
Catasta, Michele
Michalewski, Henryk
Polozov, Alex
Sutton, Charles
PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 126 - 173
[6] Agents for Data Science: From Raw Data to AI-Generated Notebooks using LLMs and Code Execution (Invited Talk)
Cai, Jiahao
PROCEEDINGS OF THE 1ST ACM INTERNATIONAL CONFERENCE ON AI-POWERED SOFTWARE, AIWARE 2024, 2024, : 181 - 181
[7] Analysis of LLMs for educational question classification and generation
Al Faraby, Said
Romadhony, Ade
Adiwijaya
Computers and Education: Artificial Intelligence, 2024, 7
[8] SemFORMS: Automatic Generation of Semantic Transforms By Mining Data Science Code
Abdelaziz, Ibrahim
Dolby, Julian
Khurana, Udayan
Samulowitz, Horst
Srinivas, Kavitha
PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 7106 - 7109
[9] When to Stop? Towards Efficient Code Generation in LLMs with Excess Token Prevention
Guo, Lianghong
Wang, Yanlin
Shi, Ensheng
Zhong, Wanjun
Zhang, Hongyu
Chen, Jiachi
Zhang, Ruikai
Ma, Yuchi
Zheng, Zibin
PROCEEDINGS OF THE 33RD ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON SOFTWARE TESTING AND ANALYSIS, ISSTA 2024, 2024, : 1073 - 1085
[10] Evaluating LLMs for Code Generation in HRI: A Comparative Study of ChatGPT, Gemini, and Claude
Sobo, Andrei
Mubarak, Awes
Baimagambetov, Almas
Polatidis, Nikolaos
APPLIED ARTIFICIAL INTELLIGENCE, 2025, 39 (01)

← 1 2 3 4 5 →