Expectation vs. Experience: Evaluating the Usability of Code Generation Tools Powered by Large Language Models

被引：218

作者：

Vaithilingam, Priyan ^{[1
]}

Zhang, Tianyi ^{[2
]}

Glassman, Elena L. ^{[1
]}

机构：

[1] Harvard Univ, Cambridge, MA 02138 USA

[2] Purdue Univ, W Lafayette, IN 47907 USA

来源：

EXTENDED ABSTRACTS OF THE 2022 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, CHI 2022 | 2022年

关键词：

large language model; github copilot;

D O I：

10.1145/3491101.3519665

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Recent advances in Large Language Models (LLM) have made automatic code generation possible for real-world programming tasks in general-purpose programming languages such as Python. However, there are few human studies on the usability of these tools and how they fit the programming workflow. In this work, we conducted a within-subjects user study with 24 participants to understand how programmers use and perceive Copilot, a LLM-based code generation tool. We found that, while Copilot did not necessarily improve the task completion time or success rate, most participants preferred to use Copilot in daily programming tasks, since Copilot often provided a useful starting point and saved the effort of searching online. However, participants did face difficulties in understanding, editing, and debugging code snippets generated by Copilot, which significantly hindered their task-solving effectiveness. Finally, we highlighted several promising directions for improving the design of Copilot based on our observations and participants' feedback.

引用

页数：7

共 50 条

[31] Hot or Cold? Adaptive Temperature Sampling for Code Generation with Large Language Models
Zhu, Yuqi
Li, Jia
Li, Ge
Zhao, YunFei
Li, Jia
Jin, Zhi
Mei, Hong
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 1, 2024, : 437 - 445
[32] GEE-OPs: An Operator Knowledge Base for Geospatial Code Generation on the Google Earth Engine Platform Powered by Large Language Models
Hou, Shuyang
Liang, Jianyuan
Zhao, Anqi
Wu, Huayi
arXiv,
[33] Humans vs. large language models: Judgmental forecasting in an era of advanced AI
Abolghasemi, Mahdi
Ganbold, Odkhishig
Rotaru, Kristian
INTERNATIONAL JOURNAL OF FORECASTING, 2025, 41 (02) : 631 - 648
[34] Decoding Stumpers: Large Language Models vs. Human Problem-Solvers
Goldstein, Alon
Having, Miriam
Reichart, Roi
Goldstein, Ariel
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 11644 - 11653
[35] Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation
Liu, Jiawei
Xia, Chunqiu Steven
Wang, Yuyao
Zhang, Lingming
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[36] Embodied human language models vs. Large Language Models, or why Artificial Intelligence cannot explain the modal be able to
Torres-Martinez, Sergio
BIOSEMIOTICS, 2024, 17 (01) : 185 - 209
[37] Embodied human language models vs. Large Language Models, or why Artificial Intelligence cannot explain the modal be able to
Sergio Torres-Martínez
Biosemiotics, 2024, 17 : 185 - 209
[38] Ironies of Programming Automation: Exploring the Experience of Code Synthesis via Large Language Models
McCabe, Alan T.
Bjorkman, Moa
Engstrom, Joel
Kuang, Peng
Soderberg, Emma
Church, Luke
PROCEEDINGS OF THE 8TH INTERNATIONAL CONFERENCE ON THE ART, SCIENCE, AND ENGINEERING OF PROGRAMMING, PROGRAMMING COMPANION 2024, 2024, : 12 - 21
[39] Enhancing Large Language Models-Based Code Generation by Leveraging Genetic Improvement
Pinna, Giovanni
Ravalico, Damiano
Rovito, Luigi
Manzoni, Luca
De Lorenzo, Andrea
GENETIC PROGRAMMING, EUROGP 2024, 2024, 14631 : 108 - 124
[40] Can ChatGPT Support Developers? An Empirical Evaluation of Large Language Models for Code Generation
Jin, Kailun
Wang, Chung-Yu
Hung Viet Pham
Hemmati, Hadi
2024 IEEE/ACM 21ST INTERNATIONAL CONFERENCE ON MINING SOFTWARE REPOSITORIES, MSR, 2024, : 167 - 171

← 1 2 3 4 5 →