CodeBERTScore: Evaluating Code Generation with Pretrained Models of Code

被引：0

作者：

Zhou, Shuyan ^{[1
]}

Alon, Uri ^{[1
,2
]}

Agarwal, Sumit ^{[1
]}

Neubig, Graham ^{[1
]}

机构：

[1] Carnegie Mellon Univ, Language Technol Inst, Pittsburgh, PA 15213 USA

[2] Google DeepMind, London, England

来源：

2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023) | 2023年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Since the rise of neural natural-language-to-code models (NL -> Code) that can generate long expressions and statements rather than a single next-token, one of the major problems has been reliably evaluating their generated output. In this paper, we propose CodeBERTScore: an evaluation metric for code generation, which builds on BERTScore (Zhang et al., 2020). Instead of encoding only the generated tokens as in BERTScore, CodeBERTScore also encodes the natural language input preceding the generated code, thus modeling the consistency between the generated code and its given natural language context as well. We perform an extensive evaluation of CodeBERTScore across four programming languages. We find that CodeBERTScore achieves a higher correlation with human preference and with functional correctness than all existing metrics. That is, generated code that receives a higher score by CodeBERTScore is more likely to be preferred by humans, as well as to function correctly when executed. We release five language-specific pretrained models to use with our publicly available code. Our language-specific models have been downloaded more than 1,000,000 times from the Huggingface Hub.(1)

引用

页码：13921 / 13937

页数：17

共 50 条

[11] Invited Paper: VerilogEval: Evaluating Large Language Models for Verilog Code Generation
Liu, Mingjie
Pinckney, Nathaniel
Khailany, Brucek
Ren, Haoxing
2023 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER AIDED DESIGN, ICCAD, 2023,
[12] Code Difference Guided Adversarial Example Generation for Deep Code Models
Tian, Zhao
Chen, Junjie
Jin, Zhi
2023 38TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING, ASE, 2023, : 850 - 862
[13] Code generation from UML models
Frohner, Ákos
Porkoláb, Zoltán
Varga, László
Periodica Polytechnica Electrical Engineering, 2000, 44 (02): : 141 - 157
[14] Will they like this? Evaluating Code Contributions With Language Models
Hellendoorn, Vincent J.
Devanbu, Premkumar T.
Bacchelli, Alberto
12TH WORKING CONFERENCE ON MINING SOFTWARE REPOSITORIES (MSR 2015), 2015, : 157 - 167
[15] JavaBench: A Benchmark of Object-Oriented Code Generation for Evaluating Large Language Models
Cao, Jialun
Chen, Zhiyong
Wu, Jiarong
Cheung, Shing-Chi
Xu, Chang
Proceedings - 2024 39th ACM/IEEE International Conference on Automated Software Engineering, ASE 2024, : 870 - 882
[16] VHDL-Eval: A Framework for Evaluating Large Language Models in VHDL Code Generation
Vijayaraghavan, Prashanth
Shi, Luyao
Ambrogio, Stefano
Mackin, Charles
Nitsure, Apoorva
Beymer, David
Degan, Ehsan
2024 IEEE LLM AIDED DESIGN WORKSHOP, LAD 2024, 2024,
[17] Evaluating the Performance of Code Generation Models for Solving Parsons Problems With Small Prompt Variations
Reeves, Brent
Sarsa, Sami
Prather, James
Denny, Paul
Becker, Brett A.
Hellas, Arto
Kimmel, Bailey
Powell, Garrett
Leinonen, Juho
PROCEEDINGS OF THE 2023 CONFERENCE ON INNOVATION AND TECHNOLOGY IN COMPUTER SCIENCE EDUCATION, ITICSE 2023, VOL 1, 2023, : 299 - 305
[18] A Comprehensive Framework for Evaluating API-oriented Code Generation in Large Language Models
Wu, Yixi
He, Pengfei
Wang, Zehao
Wang, Shaowei
Tian, Yuan
Chen, Tse-Hsun
arXiv,
[19] Gotcha! This Model Uses My Code! Evaluating Membership Leakage Risks in Code Models
Yang, Zhou
Zhao, Zhipeng
Wang, Chenyu
Shi, Jieke
Kim, Dongsun
Han, Donggyun
Lo, David
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2024, 50 (12) : 3290 - 3306
[20] Evaluating Code Comment Generation with Summarized API Docs
Matmti, Bilel
Fard, Fatemeh
Proceedings - 2023 IEEE/ACM 2nd International Workshop on Natural Language-Based Software Engineering, NLBSE 2023, 2023, : 60 - 63

← 1 2 3 4 5 →