Expectation vs. Experience: Evaluating the Usability of Code Generation Tools Powered by Large Language Models

被引:218
|
作者
Vaithilingam, Priyan [1 ]
Zhang, Tianyi [2 ]
Glassman, Elena L. [1 ]
机构
[1] Harvard Univ, Cambridge, MA 02138 USA
[2] Purdue Univ, W Lafayette, IN 47907 USA
关键词
large language model; github copilot;
D O I
10.1145/3491101.3519665
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Recent advances in Large Language Models (LLM) have made automatic code generation possible for real-world programming tasks in general-purpose programming languages such as Python. However, there are few human studies on the usability of these tools and how they fit the programming workflow. In this work, we conducted a within-subjects user study with 24 participants to understand how programmers use and perceive Copilot, a LLM-based code generation tool. We found that, while Copilot did not necessarily improve the task completion time or success rate, most participants preferred to use Copilot in daily programming tasks, since Copilot often provided a useful starting point and saved the effort of searching online. However, participants did face difficulties in understanding, editing, and debugging code snippets generated by Copilot, which significantly hindered their task-solving effectiveness. Finally, we highlighted several promising directions for improving the design of Copilot based on our observations and participants' feedback.
引用
收藏
页数:7
相关论文
共 50 条
  • [41] Balancing Security and Correctness in Code Generation: An Empirical Study on Commercial Large Language Models
    Black, Gavin S.
    Rimal, Bhaskar P.
    Vaidyan, Varghese Mathew
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2025, 9 (01): : 419 - 430
  • [42] Evaluating Impact of Conventional Code Analysis Against Large Language Models in API Vulnerability Detection
    Yildirim, Recep
    Aydin, Kerem
    Cetin, Orcun
    PROCEEDINGS OF THE 2024 EUROPEAN INTERDISCIPLINARY CYBERSECURITY CONFERENCE, EICC 2024, 2024, : 57 - 64
  • [43] Evaluating large language models as patient education tools for inflammatory bowel disease: A comparative study
    Zhang, Yan
    Wan, Xiao-Han
    Kong, Qing-Zhou
    Liu, Han
    Liu, Jun
    Guo, Jing
    Yang, Xiao-Yun
    Zuo, Xiu-Li
    Li, Yan-Qing
    WORLD JOURNAL OF GASTROENTEROLOGY, 2025, 31 (06)
  • [44] Evaluating and Enhancing Large Language Models'Performancein Domain-Specific Medicine:Development and Usability StudyWith DocOA
    Chen, Xi
    Wang, Li
    You, Mingke
    Liu, Weizhi
    Fu, Yu
    Xu, Jie
    Zhang, Shaoting
    Chen, Gang
    Li, Kang
    Li, Jian
    JOURNAL OF MEDICAL INTERNET RESEARCH, 2024, 26
  • [45] GPT-3-Powered Type Error Debugging: Investigating the Use of Large Language Models for Code Repair
    Ribeiro, Francisco
    Castro de Macedo, Jose Nuno
    Tsushima, Kanae
    Abreu, Rui
    Saraiva, Joao
    PROCEEDINGS OF THE 16TH ACM SIGPLAN INTERNATIONAL CONFERENCE ON SOFTWARE LANGUAGE ENGINEERING, SLE 2023, 2023, : 111 - 124
  • [46] Are models easier to understand than code? An empirical study on comprehension of entity-relationship (ER) models vs. structured query language (SQL) code
    Sanchez, Pablo
    Zorrilla, Marta
    Duque, Rafael
    Nieto-Reyes, Alicia
    COMPUTER SCIENCE EDUCATION, 2011, 21 (04) : 343 - 362
  • [47] Evaluating Cardiology Certification Using the ACCSAP Question Bank: Large Language Models vs Physicians
    Shahid, Abdulla
    Shetty, Naman S.
    Patel, Nirav
    Gaonkar, Mokshad
    Arora, Garima
    Arora, Pankaj
    MAYO CLINIC PROCEEDINGS, 2025, 100 (01) : 160 - 163
  • [48] Automating Patch Set Generation from Code Review Comments Using Large Language Models
    Rahman, Tajmilur
    Singh, Rahul
    Sultan, Mir Yousuf
    PROCEEDINGS 2024 IEEE/ACM 3RD INTERNATIONAL CONFERENCE ON AI ENGINEERING-SOFTWARE ENGINEERING FOR AI, CAIN 2024, 2024, : 273 - 274
  • [49] Invited: Automated Code generation for Information Technology Tasks in YAML through Large Language Models
    Pujar, Saurabh
    Buratti, Luca
    Guo, Xiaojie
    Dupuis, Nicolas
    Lewis, Burn
    Suneja, Sahil
    Sood, Atin
    Nalawade, Ganesh
    Jones, Matt
    Morari, Alessandro
    Puri, Ruchir
    2023 60TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC, 2023,
  • [50] PromSec: Prompt Optimization for Secure Generation of Functional Source Code with Large Language Models (LLMs)
    Nazzal, Mahmoud
    Khalil, Issa
    Khreishah, Abdallah
    Phan, NhatHai
    CCS 2024 - Proceedings of the 2024 ACM SIGSAC Conference on Computer and Communications Security, : 2266 - 2279