Balancing Security and Correctness in Code Generation: An Empirical Study on Commercial Large Language Models

被引:0
|
作者
Black, Gavin S. [1 ]
Rimal, Bhaskar P. [2 ]
Vaidyan, Varghese Mathew [1 ]
机构
[1] Dakota State Univ, Beacom Coll Comp & Cyber Sci, Madison, SD 57042 USA
[2] Univ Idaho, Dept Comp Sci, Moscow, ID 83844 USA
关键词
Codes; Security; Testing; Task analysis; Software; Logic; Computational intelligence; Code generation; code security; CWE; large language models; prompt engineering; vulnerability;
D O I
10.1109/TETCI.2024.3446695
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large language models (LLMs) continue to be adopted for a multitude of previously manual tasks, with code generation as a prominent use. Multiple commercial models have seen wide adoption due to the accessible nature of the interface. Simple prompts can lead to working solutions that save developers time. However, the generated code has a significant challenge with maintaining security. There are no guarantees on code safety, and LLM responses can readily include known weaknesses. To address this concern, our research examines different prompt types for shaping responses from code generation tasks to produce safer outputs. The top set of common weaknesses is generated through unconditioned prompts to create vulnerable code across multiple commercial LLMs. These inputs are then paired with different contexts, roles, and identification prompts intended to improve security. Our findings show that the inclusion of appropriate guidance reduces vulnerabilities in generated code, with the choice of model having the most significant effect. Additionally, timings are presented to demonstrate the efficiency of singular requests that limit the number of model interactions.
引用
收藏
页码:419 / 430
页数:12
相关论文
共 50 条
  • [1] Bugs in large language models generated code: an empirical study
    Tambon, Florian
    Moradi-Dakhel, Arghavan
    Nikanjam, Amin
    Khomh, Foutse
    Desmarais, Michel C.
    Antoniol, Giuliano
    EMPIRICAL SOFTWARE ENGINEERING, 2025, 30 (03)
  • [2] Can ChatGPT Support Developers? An Empirical Evaluation of Large Language Models for Code Generation
    Jin, Kailun
    Wang, Chung-Yu
    Hung Viet Pham
    Hemmati, Hadi
    2024 IEEE/ACM 21ST INTERNATIONAL CONFERENCE ON MINING SOFTWARE REPOSITORIES, MSR, 2024, : 167 - 171
  • [3] Large Language Models for Code: Security Hardening and Adversarial Testing
    He, Jingxuan
    Vechev, Martin
    PROCEEDINGS OF THE 2023 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, CCS 2023, 2023, : 1865 - 1879
  • [4] Automated Commit Message Generation With Large Language Models: An Empirical Study and Beyond
    Xue, Pengyu
    Wu, Linhao
    Yu, Zhongxing
    Jin, Zhi
    Yang, Zhen
    Li, Xinyi
    Yang, Zhenyu
    Tan, Yue
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2024, 50 (12) : 3208 - 3224
  • [5] A Comparative Analysis of Large Language Models for Code Documentation Generation
    Dvivedi, Shubhang Shekhar
    Vijay, Vyshnav
    Pujari, Sai Leela Rahul
    Lodh, Shoumik
    Kumar, Dhruv
    PROCEEDINGS OF THE 1ST ACM INTERNATIONAL CONFERENCE ON AI-POWERED SOFTWARE, AIWARE 2024, 2024, : 65 - 73
  • [6] BioCoder: a benchmark for bioinformatics code generation with large language models
    Tang, Xiangru
    Qian, Bill
    Gao, Rick
    Chen, Jiakang
    Chen, Xinyun
    Gerstein, Mark B.
    BIOINFORMATICS, 2024, 40 : i266 - i276
  • [7] Knowledge-Aware Code Generation with Large Language Models
    Huang, Tao
    Sun, Zhihong
    Jin, Zhi
    Li, Ge
    Lyu, Chen
    PROCEEDINGS 2024 32ND IEEE/ACM INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION, ICPC 2024, 2024, : 52 - 63
  • [8] Self-Planning Code Generation with Large Language Models
    Jiang, Xue
    Dong, Yihong
    Wang, Lecheng
    Fang, Zheng
    Shang, Qiwei
    Li, Ge
    Jin, Zhi
    Jiao, Wenpin
    ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY, 2024, 33 (07)
  • [9] Framework for evaluating code generation ability of large language models
    Yeo, Sangyeop
    Ma, Yu-Seung
    Kim, Sang Cheol
    Jun, Hyungkook
    Kim, Taeho
    ETRI JOURNAL, 2024, 46 (01) : 106 - 117
  • [10] On Hardware Security Bug Code Fixes by Prompting Large Language Models
    Ahmad, Baleegh
    Thakur, Shailja
    Tan, Benjamin
    Karri, Ramesh
    Pearce, Hammond
    IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2024, 19 : 4043 - 4057