CodeBERTScore: Evaluating Code Generation with Pretrained Models of Code

被引:0
|
作者
Zhou, Shuyan [1 ]
Alon, Uri [1 ,2 ]
Agarwal, Sumit [1 ]
Neubig, Graham [1 ]
机构
[1] Carnegie Mellon Univ, Language Technol Inst, Pittsburgh, PA 15213 USA
[2] Google DeepMind, London, England
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Since the rise of neural natural-language-to-code models (NL -> Code) that can generate long expressions and statements rather than a single next-token, one of the major problems has been reliably evaluating their generated output. In this paper, we propose CodeBERTScore: an evaluation metric for code generation, which builds on BERTScore (Zhang et al., 2020). Instead of encoding only the generated tokens as in BERTScore, CodeBERTScore also encodes the natural language input preceding the generated code, thus modeling the consistency between the generated code and its given natural language context as well. We perform an extensive evaluation of CodeBERTScore across four programming languages. We find that CodeBERTScore achieves a higher correlation with human preference and with functional correctness than all existing metrics. That is, generated code that receives a higher score by CodeBERTScore is more likely to be preferred by humans, as well as to function correctly when executed. We release five language-specific pretrained models to use with our publicly available code. Our language-specific models have been downloaded more than 1,000,000 times from the Huggingface Hub.(1)
引用
收藏
页码:13921 / 13937
页数:17
相关论文
共 50 条
  • [11] Invited Paper: VerilogEval: Evaluating Large Language Models for Verilog Code Generation
    Liu, Mingjie
    Pinckney, Nathaniel
    Khailany, Brucek
    Ren, Haoxing
    2023 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER AIDED DESIGN, ICCAD, 2023,
  • [12] Code Difference Guided Adversarial Example Generation for Deep Code Models
    Tian, Zhao
    Chen, Junjie
    Jin, Zhi
    2023 38TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING, ASE, 2023, : 850 - 862
  • [13] Code generation from UML models
    Frohner, Ákos
    Porkoláb, Zoltán
    Varga, László
    Periodica Polytechnica Electrical Engineering, 2000, 44 (02): : 141 - 157
  • [14] Will they like this? Evaluating Code Contributions With Language Models
    Hellendoorn, Vincent J.
    Devanbu, Premkumar T.
    Bacchelli, Alberto
    12TH WORKING CONFERENCE ON MINING SOFTWARE REPOSITORIES (MSR 2015), 2015, : 157 - 167
  • [15] JavaBench: A Benchmark of Object-Oriented Code Generation for Evaluating Large Language Models
    Cao, Jialun
    Chen, Zhiyong
    Wu, Jiarong
    Cheung, Shing-Chi
    Xu, Chang
    Proceedings - 2024 39th ACM/IEEE International Conference on Automated Software Engineering, ASE 2024, : 870 - 882
  • [16] VHDL-Eval: A Framework for Evaluating Large Language Models in VHDL Code Generation
    Vijayaraghavan, Prashanth
    Shi, Luyao
    Ambrogio, Stefano
    Mackin, Charles
    Nitsure, Apoorva
    Beymer, David
    Degan, Ehsan
    2024 IEEE LLM AIDED DESIGN WORKSHOP, LAD 2024, 2024,
  • [17] Evaluating the Performance of Code Generation Models for Solving Parsons Problems With Small Prompt Variations
    Reeves, Brent
    Sarsa, Sami
    Prather, James
    Denny, Paul
    Becker, Brett A.
    Hellas, Arto
    Kimmel, Bailey
    Powell, Garrett
    Leinonen, Juho
    PROCEEDINGS OF THE 2023 CONFERENCE ON INNOVATION AND TECHNOLOGY IN COMPUTER SCIENCE EDUCATION, ITICSE 2023, VOL 1, 2023, : 299 - 305
  • [18] A Comprehensive Framework for Evaluating API-oriented Code Generation in Large Language Models
    Wu, Yixi
    He, Pengfei
    Wang, Zehao
    Wang, Shaowei
    Tian, Yuan
    Chen, Tse-Hsun
    arXiv,
  • [19] Gotcha! This Model Uses My Code! Evaluating Membership Leakage Risks in Code Models
    Yang, Zhou
    Zhao, Zhipeng
    Wang, Chenyu
    Shi, Jieke
    Kim, Dongsun
    Han, Donggyun
    Lo, David
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2024, 50 (12) : 3290 - 3306
  • [20] Evaluating Code Comment Generation with Summarized API Docs
    Matmti, Bilel
    Fard, Fatemeh
    Proceedings - 2023 IEEE/ACM 2nd International Workshop on Natural Language-Based Software Engineering, NLBSE 2023, 2023, : 60 - 63