CodeAttack: Code-Based Adversarial Attacks for Pre-trained Programming Language Models

被引：0

作者：

Jha, Akshita ^{[1
]}

Reddy, Chandan K. ^{[1
]}

机构：

[1] Virginia Tech, Dept Comp Sci, Arlington, VA 22203 USA

来源：

THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 12 | 2023年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Pre-trained programming language (PL) models (such as CodeT5, CodeBERT, GraphCodeBERT, etc.,) have the potential to automate software engineering tasks involving code understanding and code generation. However, these models operate in the natural channel of code, i.e., they are primarily concerned with the human understanding of the code. They are not robust to changes in the input and thus, are potentially susceptible to adversarial attacks in the natural channel. We propose, CodeAttack, a simple yet effective blackbox attack model that uses code structure to generate effective, efficient, and imperceptible adversarial code samples and demonstrates the vulnerabilities of the state-of-the-art PL models to code-specific adversarial attacks. We evaluate the transferability of CodeAttack on several code-code (translation and repair) and code-NL (summarization) tasks across different programming languages. CodeAttack outperforms state-of-the-art adversarial NLP attack models to achieve the best overall drop in performance while being more efficient, imperceptible, consistent, and fluent. The code can be found at https://github.com/reddy-lab-code-research/CodeAttack.

引用

页码：14892 / 14900

页数：9

共 50 条

[21] Extending Source Code Pre-Trained Language Models to Summarise Decompiled Binaries
Al-Kaswan, Ali
Ahmed, Toufique
Izadi, Maliheh
Sawant, Anand Ashok
Devanbu, Premkumar
van Deursen, Arie
2023 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION AND REENGINEERING, SANER, 2023, : 260 - 271
[22] Exploring the Potential of Pre-Trained Language Models of Code for Automated Program Repair
Hao, Sichong
Shi, Xianjun
Liu, Hongwei
ELECTRONICS, 2024, 13 (07)
[23] A Data Cartography based MixUp for Pre-trained Language Models
Park, Seo Yeon
Caragea, Cornelia
NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 4244 - 4250
[24] Model-Agnostic Syntactical Information for Pre-Trained Programming Language Models
Saberi, Iman
Fard, Fatemeh H.
2023 IEEE/ACM 20TH INTERNATIONAL CONFERENCE ON MINING SOFTWARE REPOSITORIES, MSR, 2023, : 183 - 193
[25] On the Transferability of Pre-trained Language Models for Low-Resource Programming Languages
Chen, Fuxiang
Fard, Fatemeh H.
Lo, David
Bryksin, Timofey
30TH IEEE/ACM INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION (ICPC 2022), 2022, : 401 - 412
[26] Pre-trained transformer-based language models for Sundanese
Wilson Wongso
Henry Lucky
Derwin Suhartono
Journal of Big Data, 9
[27] Pre-trained transformer-based language models for Sundanese
Wongso, Wilson
Lucky, Henry
Suhartono, Derwin
JOURNAL OF BIG DATA, 2022, 9 (01)
[28] Annotating Columns with Pre-trained Language Models
Suhara, Yoshihiko
Li, Jinfeng
Li, Yuliang
Zhang, Dan
Demiralp, Cagatay
Chen, Chen
Tan, Wang-Chiew
PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA (SIGMOD '22), 2022, : 1493 - 1503
[29] Pre-trained Adversarial Perturbations
Ban, Yuanhao
Dong, Yinpeng
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[30] LaoPLM: Pre-trained Language Models for Lao
Lin, Nankai
Fu, Yingwen
Yang, Ziyu
Chen, Chuwei
Jiang, Shengyi
LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6506 - 6512

← 1 2 3 4 5 →