LLMs for science: Usage for code generation and data analysis

被引:3
|
作者
Nejjar, Mohamed [1 ]
Zacharias, Luca [1 ]
Stiehle, Fabian [1 ]
Weber, Ingo [1 ,2 ]
机构
[1] Tech Univ Munich, Sch Computat Informat & Technol, Munich, Germany
[2] Fraunhofer Gesell, Munich, Germany
关键词
artificial intelligence; code generation; data analysis; GenAI4Science; large language models; LLMs4Science; research methods;
D O I
10.1002/smr.2723
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Large language models (LLMs) have been touted to enable increased productivity in many areas of today's work life. Scientific research as an area of work is no exception: The potential of LLM-based tools to assist in the daily work of scientists has become a highly discussed topic across disciplines. However, we are only at the very onset of this subject of study. It is still unclear how the potential of LLMs will materialize in research practice. With this study, we give first empirical evidence on the use of LLMs in the research process. We have investigated a set of use cases for LLM-based tools in scientific research and conducted a first study to assess to which degree current tools are helpful. In this position paper, we report specifically on use cases related to software engineering, specifically, on generating application code and developing scripts for data analytics and visualization. While we studied seemingly simple use cases, results across tools differ significantly. Our results highlight the promise of LLM-based tools in general, yet we also observe various issues, particularly regarding the integrity of the output these tools provide.
引用
收藏
页数:7
相关论文
共 50 条
  • [21] Will LLMs reshape, supercharge, or kill data science? (VLDB 2023 Panel)
    Halevy, Alon
    Choi, Yejin
    Floratou, Avrilia
    Franklin, Michael J.
    Noy, Natasha
    Wang, Haixun
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2023, 16 (12): : 4114 - 4115
  • [22] Code Needs Comments: Enhancing Code LLMs with Comment Augmentation
    Song, Demin
    Guo, Honglin
    Zhou, Yunhua
    Xing, Shuhao
    Wang, Yudong
    Song, Zifan
    Zhang, Wenwei
    Guo, Qipeng
    Yan, Hang
    Qiu, Xipeng
    Lin, Dahua
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 13640 - 13656
  • [23] Learning Preference Model for LLMs via Automatic Preference Data Generation
    Huang, Shijia
    Zhao, Jianqiao
    Li, Yanyang
    Wang, Liwei
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 9187 - 9199
  • [24] On LLMs-Driven Synthetic Data Generation, Curation, and Evaluation: A Survey
    Long, Lin
    Wang, Rui
    Xiao, Ruixuan
    Zhao, Junbo
    Ding, Xiao
    Chen, Gang
    Wang, Haobo
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 11065 - 11082
  • [25] PromSec: Prompt Optimization for Secure Generation of Functional Source Code with Large Language Models (LLMs)
    Nazzal, Mahmoud
    Khalil, Issa
    Khreishah, Abdallah
    Phan, NhatHai
    CCS 2024 - Proceedings of the 2024 ACM SIGSAC Conference on Computer and Communications Security, : 2266 - 2279
  • [26] Anonymized Data: Generation, Models, Usage
    Cormode, Graham
    Srivastava, Divesh
    26TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING ICDE 2010, 2010, : 1211 - 1212
  • [27] Anonymized Data: Generation, Models, Usage
    Cormode, Graham
    Srivastava, Divesh
    ACM SIGMOD/PODS 2009 CONFERENCE, 2009, : 1015 - 1018
  • [28] Exploring Metrics for the Analysis of Code Submissions in an Introductory Data Science Course
    Huy Anh Nguyen
    Lim, Michelle
    Moore, Steven
    Nyberg, Eric
    Sakr, Majd
    Stamper, John
    LAK21 CONFERENCE PROCEEDINGS: THE ELEVENTH INTERNATIONAL CONFERENCE ON LEARNING ANALYTICS & KNOWLEDGE, 2021, : 632 - 638
  • [29] On Evaluating the Efficiency of Source Code Generated by LLMs
    Niu, Changan
    Zhang, Ting
    Li, Chuanyi
    Luo, Bin
    Ng, Vincent
    PROCEEDINGS 2024 IEEE/ACM FIRST INTERNATIONAL CONFERENCE ON AI FOUNDATION MODELS AND SOFTWARE ENGINEERING, FORGE 2024, 2024, : 103 - 107
  • [30] Qiskit Code Assistant: Training LLMs for generating Quantum Computing Code
    Dupuis, Nicolas
    Buratti, Luca
    Vishwakarma, Sanjay
    Forrat, Aitana Viudes
    Kremer, David
    Faro, Ismael
    Puri, Ruchir
    Cruz-Benito, Juan
    2024 IEEE LLM AIDED DESIGN WORKSHOP, LAD 2024, 2024,