Expectation vs. Experience: Evaluating the Usability of Code Generation Tools Powered by Large Language Models

被引：218

作者：

Vaithilingam, Priyan ^{[1
]}

Zhang, Tianyi ^{[2
]}

Glassman, Elena L. ^{[1
]}

机构：

[1] Harvard Univ, Cambridge, MA 02138 USA

[2] Purdue Univ, W Lafayette, IN 47907 USA

来源：

EXTENDED ABSTRACTS OF THE 2022 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, CHI 2022 | 2022年

关键词：

large language model; github copilot;

D O I：

10.1145/3491101.3519665

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Recent advances in Large Language Models (LLM) have made automatic code generation possible for real-world programming tasks in general-purpose programming languages such as Python. However, there are few human studies on the usability of these tools and how they fit the programming workflow. In this work, we conducted a within-subjects user study with 24 participants to understand how programmers use and perceive Copilot, a LLM-based code generation tool. We found that, while Copilot did not necessarily improve the task completion time or success rate, most participants preferred to use Copilot in daily programming tasks, since Copilot often provided a useful starting point and saved the effort of searching online. However, participants did face difficulties in understanding, editing, and debugging code snippets generated by Copilot, which significantly hindered their task-solving effectiveness. Finally, we highlighted several promising directions for improving the design of Copilot based on our observations and participants' feedback.

引用

页数：7

共 50 条

[1] Framework for evaluating code generation ability of large language models
Yeo, Sangyeop
Ma, Yu-Seung
Kim, Sang Cheol
Jun, Hyungkook
Kim, Taeho
ETRI JOURNAL, 2024, 46 (01) : 106 - 117
[2] Evaluating the Language Abilities of Large Language Models vs. Humans: Three Caveats
Leivada, Evelina
Dentella, Vittoria
Guenther, Fritz
BIOLINGUISTICS, 2024, 18
[3] Invited Paper: VerilogEval: Evaluating Large Language Models for Verilog Code Generation
Liu, Mingjie
Pinckney, Nathaniel
Khailany, Brucek
Ren, Haoxing
2023 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER AIDED DESIGN, ICCAD, 2023,
[4] JavaBench: A Benchmark of Object-Oriented Code Generation for Evaluating Large Language Models
Cao, Jialun
Chen, Zhiyong
Wu, Jiarong
Cheung, Shing-Chi
Xu, Chang
Proceedings - 2024 39th ACM/IEEE International Conference on Automated Software Engineering, ASE 2024, : 870 - 882
[5] VHDL-Eval: A Framework for Evaluating Large Language Models in VHDL Code Generation
Vijayaraghavan, Prashanth
Shi, Luyao
Ambrogio, Stefano
Mackin, Charles
Nitsure, Apoorva
Beymer, David
Degan, Ehsan
2024 IEEE LLM AIDED DESIGN WORKSHOP, LAD 2024, 2024,
[6] A Comprehensive Framework for Evaluating API-oriented Code Generation in Large Language Models
Wu, Yixi
He, Pengfei
Wang, Zehao
Wang, Shaowei
Tian, Yuan
Chen, Tse-Hsun
arXiv,
[7] L2CEval: Evaluating Language-to-Code Generation Capabilities of Large Language Models
Ni, Ansong
Yin, Pengcheng
Zhao, Yilun
Riddell, Martin
Feng, Troy
Shen, Rui
Yin, Stephen
Liu, Ye
Yavuz, Semih
Xiong, Caiming
Joty, Shafiq
Zhou, Yingbo
Radev, Dragomir
Cohan, Arman
TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2024, 12 : 1311 - 1329
[8] Quantifying Contamination in Evaluating Code Generation Capabilities of Language Models
Riddell, Martin
Ni, Ansong
Cohan, Arman
PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 14116 - 14137
[9] Evaluating Large Language Models on Controlled Generation Tasks
Sun, Jiao
Tian, Yufei
Zhou, Wangchunshu
Xu, Nan
Hu, Qian
Gupta, Rahul
Wieting, John
Peng, Nanyun
Ma, Xuezhe
2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 3155 - 3168
[10] Natural Language Dataset Generation Framework for Visualizations Powered by Large Language Models
Ko, Hyung-Kwon
Jeon, Hyeon
Park, Gwanmo
Kim, Dae Hyun
Kim, Nam Wook
Kim, Juho
Seo, Jinwook
PROCEEDINGS OF THE 2024 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYTEMS (CHI 2024), 2024,

← 1 2 3 4 5 →