Balancing Security and Correctness in Code Generation: An Empirical Study on Commercial Large Language Models

被引：0

作者：

Black, Gavin S. ^{[1
]}

Rimal, Bhaskar P. ^{[2
]}

Vaidyan, Varghese Mathew ^{[1
]}

机构：

[1] Dakota State Univ, Beacom Coll Comp & Cyber Sci, Madison, SD 57042 USA

[2] Univ Idaho, Dept Comp Sci, Moscow, ID 83844 USA

来源：

IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE | 2025年 / 9卷 / 01期

关键词：

Codes; Security; Testing; Task analysis; Software; Logic; Computational intelligence; Code generation; code security; CWE; large language models; prompt engineering; vulnerability;

D O I：

10.1109/TETCI.2024.3446695

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Large language models (LLMs) continue to be adopted for a multitude of previously manual tasks, with code generation as a prominent use. Multiple commercial models have seen wide adoption due to the accessible nature of the interface. Simple prompts can lead to working solutions that save developers time. However, the generated code has a significant challenge with maintaining security. There are no guarantees on code safety, and LLM responses can readily include known weaknesses. To address this concern, our research examines different prompt types for shaping responses from code generation tasks to produce safer outputs. The top set of common weaknesses is generated through unconditioned prompts to create vulnerable code across multiple commercial LLMs. These inputs are then paired with different contexts, roles, and identification prompts intended to improve security. Our findings show that the inclusion of appropriate guidance reduces vulnerabilities in generated code, with the choice of model having the most significant effect. Additionally, timings are presented to demonstrate the efficiency of singular requests that limit the number of model interactions.

引用

页码：419 / 430

页数：12

共 50 条

[1] Bugs in large language models generated code: an empirical study
Tambon, Florian
Moradi-Dakhel, Arghavan
Nikanjam, Amin
Khomh, Foutse
Desmarais, Michel C.
Antoniol, Giuliano
EMPIRICAL SOFTWARE ENGINEERING, 2025, 30 (03)
[2] Can ChatGPT Support Developers? An Empirical Evaluation of Large Language Models for Code Generation
Jin, Kailun
Wang, Chung-Yu
Hung Viet Pham
Hemmati, Hadi
2024 IEEE/ACM 21ST INTERNATIONAL CONFERENCE ON MINING SOFTWARE REPOSITORIES, MSR, 2024, : 167 - 171
[3] Large Language Models for Code: Security Hardening and Adversarial Testing
He, Jingxuan
Vechev, Martin
PROCEEDINGS OF THE 2023 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, CCS 2023, 2023, : 1865 - 1879
[4] Automated Commit Message Generation With Large Language Models: An Empirical Study and Beyond
Xue, Pengyu
Wu, Linhao
Yu, Zhongxing
Jin, Zhi
Yang, Zhen
Li, Xinyi
Yang, Zhenyu
Tan, Yue
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2024, 50 (12) : 3208 - 3224
[5] A Comparative Analysis of Large Language Models for Code Documentation Generation
Dvivedi, Shubhang Shekhar
Vijay, Vyshnav
Pujari, Sai Leela Rahul
Lodh, Shoumik
Kumar, Dhruv
PROCEEDINGS OF THE 1ST ACM INTERNATIONAL CONFERENCE ON AI-POWERED SOFTWARE, AIWARE 2024, 2024, : 65 - 73
[6] BioCoder: a benchmark for bioinformatics code generation with large language models
Tang, Xiangru
Qian, Bill
Gao, Rick
Chen, Jiakang
Chen, Xinyun
Gerstein, Mark B.
BIOINFORMATICS, 2024, 40 : i266 - i276
[7] Knowledge-Aware Code Generation with Large Language Models
Huang, Tao
Sun, Zhihong
Jin, Zhi
Li, Ge
Lyu, Chen
PROCEEDINGS 2024 32ND IEEE/ACM INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION, ICPC 2024, 2024, : 52 - 63
[8] Self-Planning Code Generation with Large Language Models
Jiang, Xue
Dong, Yihong
Wang, Lecheng
Fang, Zheng
Shang, Qiwei
Li, Ge
Jin, Zhi
Jiao, Wenpin
ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY, 2024, 33 (07)
[9] Framework for evaluating code generation ability of large language models
Yeo, Sangyeop
Ma, Yu-Seung
Kim, Sang Cheol
Jun, Hyungkook
Kim, Taeho
ETRI JOURNAL, 2024, 46 (01) : 106 - 117
[10] On Hardware Security Bug Code Fixes by Prompting Large Language Models
Ahmad, Baleegh
Thakur, Shailja
Tan, Benjamin
Karri, Ramesh
Pearce, Hammond
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2024, 19 : 4043 - 4057

← 1 2 3 4 5 →