Expectation vs. Experience: Evaluating the Usability of Code Generation Tools Powered by Large Language Models

被引:218
|
作者
Vaithilingam, Priyan [1 ]
Zhang, Tianyi [2 ]
Glassman, Elena L. [1 ]
机构
[1] Harvard Univ, Cambridge, MA 02138 USA
[2] Purdue Univ, W Lafayette, IN 47907 USA
关键词
large language model; github copilot;
D O I
10.1145/3491101.3519665
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Recent advances in Large Language Models (LLM) have made automatic code generation possible for real-world programming tasks in general-purpose programming languages such as Python. However, there are few human studies on the usability of these tools and how they fit the programming workflow. In this work, we conducted a within-subjects user study with 24 participants to understand how programmers use and perceive Copilot, a LLM-based code generation tool. We found that, while Copilot did not necessarily improve the task completion time or success rate, most participants preferred to use Copilot in daily programming tasks, since Copilot often provided a useful starting point and saved the effort of searching online. However, participants did face difficulties in understanding, editing, and debugging code snippets generated by Copilot, which significantly hindered their task-solving effectiveness. Finally, we highlighted several promising directions for improving the design of Copilot based on our observations and participants' feedback.
引用
收藏
页数:7
相关论文
共 50 条
  • [31] Hot or Cold? Adaptive Temperature Sampling for Code Generation with Large Language Models
    Zhu, Yuqi
    Li, Jia
    Li, Ge
    Zhao, YunFei
    Li, Jia
    Jin, Zhi
    Mei, Hong
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 1, 2024, : 437 - 445
  • [32] GEE-OPs: An Operator Knowledge Base for Geospatial Code Generation on the Google Earth Engine Platform Powered by Large Language Models
    Hou, Shuyang
    Liang, Jianyuan
    Zhao, Anqi
    Wu, Huayi
    arXiv,
  • [33] Humans vs. large language models: Judgmental forecasting in an era of advanced AI
    Abolghasemi, Mahdi
    Ganbold, Odkhishig
    Rotaru, Kristian
    INTERNATIONAL JOURNAL OF FORECASTING, 2025, 41 (02) : 631 - 648
  • [34] Decoding Stumpers: Large Language Models vs. Human Problem-Solvers
    Goldstein, Alon
    Having, Miriam
    Reichart, Roi
    Goldstein, Ariel
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 11644 - 11653
  • [35] Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation
    Liu, Jiawei
    Xia, Chunqiu Steven
    Wang, Yuyao
    Zhang, Lingming
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [36] Embodied human language models vs. Large Language Models, or why Artificial Intelligence cannot explain the modal be able to
    Torres-Martinez, Sergio
    BIOSEMIOTICS, 2024, 17 (01) : 185 - 209
  • [37] Embodied human language models vs. Large Language Models, or why Artificial Intelligence cannot explain the modal be able to
    Sergio Torres-Martínez
    Biosemiotics, 2024, 17 : 185 - 209
  • [38] Ironies of Programming Automation: Exploring the Experience of Code Synthesis via Large Language Models
    McCabe, Alan T.
    Bjorkman, Moa
    Engstrom, Joel
    Kuang, Peng
    Soderberg, Emma
    Church, Luke
    PROCEEDINGS OF THE 8TH INTERNATIONAL CONFERENCE ON THE ART, SCIENCE, AND ENGINEERING OF PROGRAMMING, PROGRAMMING COMPANION 2024, 2024, : 12 - 21
  • [39] Enhancing Large Language Models-Based Code Generation by Leveraging Genetic Improvement
    Pinna, Giovanni
    Ravalico, Damiano
    Rovito, Luigi
    Manzoni, Luca
    De Lorenzo, Andrea
    GENETIC PROGRAMMING, EUROGP 2024, 2024, 14631 : 108 - 124
  • [40] Can ChatGPT Support Developers? An Empirical Evaluation of Large Language Models for Code Generation
    Jin, Kailun
    Wang, Chung-Yu
    Hung Viet Pham
    Hemmati, Hadi
    2024 IEEE/ACM 21ST INTERNATIONAL CONFERENCE ON MINING SOFTWARE REPOSITORIES, MSR, 2024, : 167 - 171