LLMs for science: Usage for code generation and data analysis

被引:3
|
作者
Nejjar, Mohamed [1 ]
Zacharias, Luca [1 ]
Stiehle, Fabian [1 ]
Weber, Ingo [1 ,2 ]
机构
[1] Tech Univ Munich, Sch Computat Informat & Technol, Munich, Germany
[2] Fraunhofer Gesell, Munich, Germany
关键词
artificial intelligence; code generation; data analysis; GenAI4Science; large language models; LLMs4Science; research methods;
D O I
10.1002/smr.2723
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Large language models (LLMs) have been touted to enable increased productivity in many areas of today's work life. Scientific research as an area of work is no exception: The potential of LLM-based tools to assist in the daily work of scientists has become a highly discussed topic across disciplines. However, we are only at the very onset of this subject of study. It is still unclear how the potential of LLMs will materialize in research practice. With this study, we give first empirical evidence on the use of LLMs in the research process. We have investigated a set of use cases for LLM-based tools in scientific research and conducted a first study to assess to which degree current tools are helpful. In this position paper, we report specifically on use cases related to software engineering, specifically, on generating application code and developing scripts for data analytics and visualization. While we studied seemingly simple use cases, results across tools differ significantly. Our results highlight the promise of LLM-based tools in general, yet we also observe various issues, particularly regarding the integrity of the output these tools provide.
引用
收藏
页数:7
相关论文
共 50 条
  • [1] LLMs: Understanding Code Syntax and Semantics for Code Analysis
    Ma, Wei
    Wang, Wenhan
    Liu, Ye
    Liu, Shangqing
    Hu, Qiang
    Li, Li
    Liu, Yang
    Lin, Zhihao
    Zhang, Cen
    Nie, Liming
    arXiv, 2023,
  • [2] Towards Efficient DataWrangling with LLMs using Code Generation
    Li, Xue
    Dohmen, Till
    PROCEEDINGS OF THE 8TH WORKSHOP ON DATA MANAGEMENT FOR END-TO-END MACHINE LEARNING, DEEM 2024, 2024,
  • [3] Chain-of-Descriptions: Improving Code LLMs for VHDL Code Generation and Summarization
    Vijayaraghavan, Prashanth
    Nitsure, Apoorva
    Mackin, Charles
    Shi, Luyao
    Ambrogio, Stefano
    Haran, Arvind
    Paruthi, Viresh
    Elzein, Ali
    Coops, Dan
    Beymer, David
    Baldwin, Tyler
    Degan, Ehsan
    PROCEEDINGS OF THE 2024 ACM/IEEE INTERNATIONAL SYMPOSIUM ON MACHINE LEARNING FOR CAD, MLCAD 2024, 2024,
  • [4] A New Method Using LLMs for Keypoints Generation in Qualitative Data Analysis
    Zhao, Fengxiang
    Yu, Fan
    Trull, Timothy
    Shang, Yi
    2023 IEEE CONFERENCE ON ARTIFICIAL INTELLIGENCE, CAI, 2023, : 333 - 334
  • [5] Natural Language to Code Generation in Interactive Data Science Notebooks
    Yin, Pengcheng
    Li, Wen-Ding
    Xiao, Kefan
    Rao, Abhishek
    Wen, Yeming
    Shi, Kensen
    Howland, Joshua
    Bailey, Paige
    Catasta, Michele
    Michalewski, Henryk
    Polozov, Alex
    Sutton, Charles
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 126 - 173
  • [6] Agents for Data Science: From Raw Data to AI-Generated Notebooks using LLMs and Code Execution (Invited Talk)
    Cai, Jiahao
    PROCEEDINGS OF THE 1ST ACM INTERNATIONAL CONFERENCE ON AI-POWERED SOFTWARE, AIWARE 2024, 2024, : 181 - 181
  • [7] Analysis of LLMs for educational question classification and generation
    Al Faraby, Said
    Romadhony, Ade
    Adiwijaya
    Computers and Education: Artificial Intelligence, 2024, 7
  • [8] SemFORMS: Automatic Generation of Semantic Transforms By Mining Data Science Code
    Abdelaziz, Ibrahim
    Dolby, Julian
    Khurana, Udayan
    Samulowitz, Horst
    Srinivas, Kavitha
    PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 7106 - 7109
  • [9] When to Stop? Towards Efficient Code Generation in LLMs with Excess Token Prevention
    Guo, Lianghong
    Wang, Yanlin
    Shi, Ensheng
    Zhong, Wanjun
    Zhang, Hongyu
    Chen, Jiachi
    Zhang, Ruikai
    Ma, Yuchi
    Zheng, Zibin
    PROCEEDINGS OF THE 33RD ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON SOFTWARE TESTING AND ANALYSIS, ISSTA 2024, 2024, : 1073 - 1085
  • [10] Evaluating LLMs for Code Generation in HRI: A Comparative Study of ChatGPT, Gemini, and Claude
    Sobo, Andrei
    Mubarak, Awes
    Baimagambetov, Almas
    Polatidis, Nikolaos
    APPLIED ARTIFICIAL INTELLIGENCE, 2025, 39 (01)