Expectation vs. Experience: Evaluating the Usability of Code Generation Tools Powered by Large Language Models

被引：218

作者：

Vaithilingam, Priyan ^{[1
]}

Zhang, Tianyi ^{[2
]}

Glassman, Elena L. ^{[1
]}

机构：

[1] Harvard Univ, Cambridge, MA 02138 USA

[2] Purdue Univ, W Lafayette, IN 47907 USA

来源：

EXTENDED ABSTRACTS OF THE 2022 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, CHI 2022 | 2022年

关键词：

large language model; github copilot;

D O I：

10.1145/3491101.3519665

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Recent advances in Large Language Models (LLM) have made automatic code generation possible for real-world programming tasks in general-purpose programming languages such as Python. However, there are few human studies on the usability of these tools and how they fit the programming workflow. In this work, we conducted a within-subjects user study with 24 participants to understand how programmers use and perceive Copilot, a LLM-based code generation tool. We found that, while Copilot did not necessarily improve the task completion time or success rate, most participants preferred to use Copilot in daily programming tasks, since Copilot often provided a useful starting point and saved the effort of searching online. However, participants did face difficulties in understanding, editing, and debugging code snippets generated by Copilot, which significantly hindered their task-solving effectiveness. Finally, we highlighted several promising directions for improving the design of Copilot based on our observations and participants' feedback.

引用

页数：7

共 50 条

[41] Balancing Security and Correctness in Code Generation: An Empirical Study on Commercial Large Language Models
Black, Gavin S.
Rimal, Bhaskar P.
Vaidyan, Varghese Mathew
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2025, 9 (01): : 419 - 430
[42] Evaluating Impact of Conventional Code Analysis Against Large Language Models in API Vulnerability Detection
Yildirim, Recep
Aydin, Kerem
Cetin, Orcun
PROCEEDINGS OF THE 2024 EUROPEAN INTERDISCIPLINARY CYBERSECURITY CONFERENCE, EICC 2024, 2024, : 57 - 64
[43] Evaluating large language models as patient education tools for inflammatory bowel disease: A comparative study
Zhang, Yan
Wan, Xiao-Han
Kong, Qing-Zhou
Liu, Han
Liu, Jun
Guo, Jing
Yang, Xiao-Yun
Zuo, Xiu-Li
Li, Yan-Qing
WORLD JOURNAL OF GASTROENTEROLOGY, 2025, 31 (06)
[44] Evaluating and Enhancing Large Language Models'Performancein Domain-Specific Medicine:Development and Usability StudyWith DocOA
Chen, Xi
Wang, Li
You, Mingke
Liu, Weizhi
Fu, Yu
Xu, Jie
Zhang, Shaoting
Chen, Gang
Li, Kang
Li, Jian
JOURNAL OF MEDICAL INTERNET RESEARCH, 2024, 26
[45] GPT-3-Powered Type Error Debugging: Investigating the Use of Large Language Models for Code Repair
Ribeiro, Francisco
Castro de Macedo, Jose Nuno
Tsushima, Kanae
Abreu, Rui
Saraiva, Joao
PROCEEDINGS OF THE 16TH ACM SIGPLAN INTERNATIONAL CONFERENCE ON SOFTWARE LANGUAGE ENGINEERING, SLE 2023, 2023, : 111 - 124
[46] Are models easier to understand than code? An empirical study on comprehension of entity-relationship (ER) models vs. structured query language (SQL) code
Sanchez, Pablo
Zorrilla, Marta
Duque, Rafael
Nieto-Reyes, Alicia
COMPUTER SCIENCE EDUCATION, 2011, 21 (04) : 343 - 362
[47] Evaluating Cardiology Certification Using the ACCSAP Question Bank: Large Language Models vs Physicians
Shahid, Abdulla
Shetty, Naman S.
Patel, Nirav
Gaonkar, Mokshad
Arora, Garima
Arora, Pankaj
MAYO CLINIC PROCEEDINGS, 2025, 100 (01) : 160 - 163
[48] Automating Patch Set Generation from Code Review Comments Using Large Language Models
Rahman, Tajmilur
Singh, Rahul
Sultan, Mir Yousuf
PROCEEDINGS 2024 IEEE/ACM 3RD INTERNATIONAL CONFERENCE ON AI ENGINEERING-SOFTWARE ENGINEERING FOR AI, CAIN 2024, 2024, : 273 - 274
[49] Invited: Automated Code generation for Information Technology Tasks in YAML through Large Language Models
Pujar, Saurabh
Buratti, Luca
Guo, Xiaojie
Dupuis, Nicolas
Lewis, Burn
Suneja, Sahil
Sood, Atin
Nalawade, Ganesh
Jones, Matt
Morari, Alessandro
Puri, Ruchir
2023 60TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC, 2023,
[50] PromSec: Prompt Optimization for Secure Generation of Functional Source Code with Large Language Models (LLMs)
Nazzal, Mahmoud
Khalil, Issa
Khreishah, Abdallah
Phan, NhatHai
CCS 2024 - Proceedings of the 2024 ACM SIGSAC Conference on Computer and Communications Security, : 2266 - 2279

← 1 2 3 4 5 →