Expectation vs. Experience: Evaluating the Usability of Code Generation Tools Powered by Large Language Models

被引：218

作者：

Vaithilingam, Priyan ^{[1
]}

Zhang, Tianyi ^{[2
]}

Glassman, Elena L. ^{[1
]}

机构：

[1] Harvard Univ, Cambridge, MA 02138 USA

[2] Purdue Univ, W Lafayette, IN 47907 USA

来源：

EXTENDED ABSTRACTS OF THE 2022 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, CHI 2022 | 2022年

关键词：

large language model; github copilot;

D O I：

10.1145/3491101.3519665

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Recent advances in Large Language Models (LLM) have made automatic code generation possible for real-world programming tasks in general-purpose programming languages such as Python. However, there are few human studies on the usability of these tools and how they fit the programming workflow. In this work, we conducted a within-subjects user study with 24 participants to understand how programmers use and perceive Copilot, a LLM-based code generation tool. We found that, while Copilot did not necessarily improve the task completion time or success rate, most participants preferred to use Copilot in daily programming tasks, since Copilot often provided a useful starting point and saved the effort of searching online. However, participants did face difficulties in understanding, editing, and debugging code snippets generated by Copilot, which significantly hindered their task-solving effectiveness. Finally, we highlighted several promising directions for improving the design of Copilot based on our observations and participants' feedback.

引用

页数：7

共 50 条

[21] Benchmarking Large Language Models for Automated Verilog RTL Code Generation
Thakur, Shailja
Ahmad, Baleegh
Fan, Zhenxing
Pearce, Hammond
Tan, Benjamin
Karri, Ramesh
Dolan-Gavitt, Brendan
Garg, Siddharth
2023 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION, DATE, 2023,
[22] On the Effectiveness of Large Language Models in Domain-Specific Code Generation
Gu, Xiaodong
Chen, Meng
Lin, Yalan
Hu, Yuhan
Zhang, Hongyu
Wan, Chengcheng
Wei, Zhao
Xu, Yong
Wang, Juhong
ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY, 2025, 34 (03)
[23] Multi-stage guided code generation for Large Language Models
Han, Yewei
Lyu, Chen
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2025, 139
[24] Evaluating Large Language Models for Automated CPT Code Prediction in Endovascular Neurosurgery
Roy, Joanna M.
Self, D. Mitchell
Isch, Emily
Musmar, Basel
Lan, Matthews
Keppetipola, Kavantissa
Koduri, Sravanthi
Pontarelli, Mary-Katharine
Tjoumakaris, Stavropoula I.
Gooch, M. Reid
Rosenwasser, Robert H.
Jabbour, Pascal M.
JOURNAL OF MEDICAL SYSTEMS, 2025, 49 (01)
[25] Evaluating Large Language Models for G-Code Debugging, Manipulation, and Comprehension
Jignasu, Anushrut
Marshall, Kelly
Ganapathysubramanian, Baskar
Balu, Aditya
Hegde, Chinmay
Krishnamurthy, Adarsh
2024 IEEE LLM AIDED DESIGN WORKSHOP, LAD 2024, 2024,
[26] Evaluating application of large language models to biomedical patent claim generation
Chen, Feng-Chi
Pan, Chia-Lin
AIPlux Development Team, AIPlux Development
WORLD PATENT INFORMATION, 2025, 80
[27] Humans vs large language models: An assessment of evaluating online dermatological misinformation
Fanous, A. H.
Le, M.
Rezaei, S.
Xu, S.
Ko, J.
Lipoff, J.
Daneshjou, R.
JOURNAL OF INVESTIGATIVE DERMATOLOGY, 2024, 144 (08) : S130 - S130
[28] Code-level quantum circuit generation based on large language models
He, Zhimin
Li, Guohong
Situ, Haozhen
Zhou, Yan
Zheng, Shenggen
Li, Lvzhou
SCIENTIA SINICA-PHYSICA MECHANICA & ASTRONOMICA, 2025, 55 (04)
[29] FormalEval: A Method for Automatic Evaluation of Code Generation via Large Language Models
Yang, Sichao
Yang, Ye
2024 INTERNATIONAL SYMPOSIUM OF ELECTRONICS DESIGN AUTOMATION, ISEDA 2024, 2024, : 660 - 665
[30] Automatic Generation of Programming Exercises and Code Explanations Using Large Language Models
Sarsa, Sami
Denny, Paul
Hellas, Arto
Leinonen, Juho
PROCEEDINGS OF THE 2022 ACM CONFERENCE ON INTERNATIONAL COMPUTING EDUCATION RESEARCH, ICER 2022, VOL. 1, 2023, : 27 - 43

← 1 2 3 4 5 →