Balancing Security and Correctness in Code Generation: An Empirical Study on Commercial Large Language Models

被引:0
|
作者
Black, Gavin S. [1 ]
Rimal, Bhaskar P. [2 ]
Vaidyan, Varghese Mathew [1 ]
机构
[1] Dakota State Univ, Beacom Coll Comp & Cyber Sci, Madison, SD 57042 USA
[2] Univ Idaho, Dept Comp Sci, Moscow, ID 83844 USA
关键词
Codes; Security; Testing; Task analysis; Software; Logic; Computational intelligence; Code generation; code security; CWE; large language models; prompt engineering; vulnerability;
D O I
10.1109/TETCI.2024.3446695
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large language models (LLMs) continue to be adopted for a multitude of previously manual tasks, with code generation as a prominent use. Multiple commercial models have seen wide adoption due to the accessible nature of the interface. Simple prompts can lead to working solutions that save developers time. However, the generated code has a significant challenge with maintaining security. There are no guarantees on code safety, and LLM responses can readily include known weaknesses. To address this concern, our research examines different prompt types for shaping responses from code generation tasks to produce safer outputs. The top set of common weaknesses is generated through unconditioned prompts to create vulnerable code across multiple commercial LLMs. These inputs are then paired with different contexts, roles, and identification prompts intended to improve security. Our findings show that the inclusion of appropriate guidance reduces vulnerabilities in generated code, with the choice of model having the most significant effect. Additionally, timings are presented to demonstrate the efficiency of singular requests that limit the number of model interactions.
引用
收藏
页码:419 / 430
页数:12
相关论文
共 50 条
  • [41] (Security) Assertions by Large Language Models
    Kande, Rahul
    Pearce, Hammond
    Tan, Benjamin
    Dolan-Gavitt, Brendan
    Thakur, Shailja
    Karri, Ramesh
    Rajendran, Jeyavijayan
    IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2024, 19 : 4374 - 4389
  • [42] L2CEval: Evaluating Language-to-Code Generation Capabilities of Large Language Models
    Ni, Ansong
    Yin, Pengcheng
    Zhao, Yilun
    Riddell, Martin
    Feng, Troy
    Shen, Rui
    Yin, Stephen
    Liu, Ye
    Yavuz, Semih
    Xiong, Caiming
    Joty, Shafiq
    Zhou, Yingbo
    Radev, Dragomir
    Cohan, Arman
    TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2024, 12 : 1311 - 1329
  • [43] Benchmarking Causal Study to Interpret Large Language Models for Source Code
    Rodriguez-Cardenas, Daniel
    Palacio, David N.
    Khati, Dipin
    Burke, Henry
    Poshyvanyk, Denys
    2023 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION, ICSME, 2023, : 329 - 334
  • [44] Evaluating Source Code Quality with Large Language Models: a comparative study
    da Silva Simões, Igor Regis
    Venson, Elaine
    arXiv,
  • [45] JavaBench: A Benchmark of Object-Oriented Code Generation for Evaluating Large Language Models
    Cao, Jialun
    Chen, Zhiyong
    Wu, Jiarong
    Cheung, Shing-Chi
    Xu, Chang
    Proceedings - 2024 39th ACM/IEEE International Conference on Automated Software Engineering, ASE 2024, : 870 - 882
  • [46] Enhancing Large Language Models-Based Code Generation by Leveraging Genetic Improvement
    Pinna, Giovanni
    Ravalico, Damiano
    Rovito, Luigi
    Manzoni, Luca
    De Lorenzo, Andrea
    GENETIC PROGRAMMING, EUROGP 2024, 2024, 14631 : 108 - 124
  • [47] VHDL-Eval: A Framework for Evaluating Large Language Models in VHDL Code Generation
    Vijayaraghavan, Prashanth
    Shi, Luyao
    Ambrogio, Stefano
    Mackin, Charles
    Nitsure, Apoorva
    Beymer, David
    Degan, Ehsan
    2024 IEEE LLM AIDED DESIGN WORKSHOP, LAD 2024, 2024,
  • [48] A Comprehensive Framework for Evaluating API-oriented Code Generation in Large Language Models
    Wu, Yixi
    He, Pengfei
    Wang, Zehao
    Wang, Shaowei
    Tian, Yuan
    Chen, Tse-Hsun
    arXiv,
  • [49] An Empirical Study of Instruction-tuning Large Language Models in Chinese
    Si, Qingyi
    Wang, Tong
    Lin, Zheng
    Zhang, Xu
    Cao, Yanan
    Wang, Weiping
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 4086 - 4107
  • [50] Using Large Language Models to Generate JUnit Tests: An Empirical Study
    Siddiq, Mohammed Latif
    Santos, Joanna C. S.
    Tanvir, Ridwanul Hasan
    Ulfat, Noshin
    Al Rifat, Fahmid
    Lopes, Vinicius Carvalho
    PROCEEDINGS OF 2024 28TH INTERNATION CONFERENCE ON EVALUATION AND ASSESSMENT IN SOFTWARE ENGINEERING, EASE 2024, 2024, : 313 - 322