CodeAttack: Code-Based Adversarial Attacks for Pre-trained Programming Language Models

被引:0
|
作者
Jha, Akshita [1 ]
Reddy, Chandan K. [1 ]
机构
[1] Virginia Tech, Dept Comp Sci, Arlington, VA 22203 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Pre-trained programming language (PL) models (such as CodeT5, CodeBERT, GraphCodeBERT, etc.,) have the potential to automate software engineering tasks involving code understanding and code generation. However, these models operate in the natural channel of code, i.e., they are primarily concerned with the human understanding of the code. They are not robust to changes in the input and thus, are potentially susceptible to adversarial attacks in the natural channel. We propose, CodeAttack, a simple yet effective blackbox attack model that uses code structure to generate effective, efficient, and imperceptible adversarial code samples and demonstrates the vulnerabilities of the state-of-the-art PL models to code-specific adversarial attacks. We evaluate the transferability of CodeAttack on several code-code (translation and repair) and code-NL (summarization) tasks across different programming languages. CodeAttack outperforms state-of-the-art adversarial NLP attack models to achieve the best overall drop in performance while being more efficient, imperceptible, consistent, and fluent. The code can be found at https://github.com/reddy-lab-code-research/CodeAttack.
引用
收藏
页码:14892 / 14900
页数:9
相关论文
共 50 条
  • [41] Bridge and Hint: Extending Pre-trained Language Models for Long-Range Code
    Chen, Yujia
    Gao, Cuiyun
    Yang, Zezhou
    Zhang, Hongyu
    Liao, Qing
    PROCEEDINGS OF THE 33RD ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON SOFTWARE TESTING AND ANALYSIS, ISSTA 2024, 2024, : 274 - 286
  • [42] What Do They Capture? - A Structural Analysis of Pre-Trained Language Models for Source Code
    Wan, Yao
    Zhao, Wei
    Zhang, Hongyu
    Sui, Yulei
    Xu, Guandong
    Jin, Hai
    2022 ACM/IEEE 44TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2022), 2022, : 2377 - 2388
  • [43] What do pre-trained code models know about code?
    Karmakar, Anjan
    Robbes, Romain
    2021 36TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING ASE 2021, 2021, : 1332 - 1336
  • [44] Diet Code Is Healthy: Simplifying Programs for Pre-trained Models of Code
    Zhang, Zhaowei
    Zhang, Hongyu
    Shen, Beijun
    Gu, Xiaodong
    PROCEEDINGS OF THE 30TH ACM JOINT MEETING EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, ESEC/FSE 2022, 2022, : 1073 - 1084
  • [45] NtNDet: Hardware Trojan detection based on pre-trained language models
    Kuang, Shijie
    Quan, Zhe
    Xie, Guoqi
    Cai, Xiaomin
    Li, Keqin
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 271
  • [46] G-Tuning: Improving Generalization of Pre-trained Language Models with Generative Adversarial Network
    Weng, Rongxiang
    Cheng, Wensen
    Zhang, Min
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 4747 - 4755
  • [47] A Study of Pre-trained Language Models in Natural Language Processing
    Duan, Jiajia
    Zhao, Hui
    Zhou, Qian
    Qiu, Meikang
    Liu, Meiqin
    2020 IEEE INTERNATIONAL CONFERENCE ON SMART CLOUD (SMARTCLOUD 2020), 2020, : 116 - 121
  • [48] How Should Pre-Trained Language Models Be Fine-Tuned Towards Adversarial Robustness?
    Dong, Xinshuai
    Luu Anh Tuan
    Lin, Min
    Yan, Shuicheng
    Zhang, Hanwang
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [49] Entity Resolution Based on Pre-trained Language Models with Two Attentions
    Zhu, Liang
    Liu, Hao
    Song, Xin
    Wei, Yonggang
    Wang, Yu
    WEB AND BIG DATA, PT III, APWEB-WAIM 2023, 2024, 14333 : 433 - 448
  • [50] Intelligent Completion of Ancient Texts Based on Pre-trained Language Models
    Li J.
    Ming C.
    Guo Z.
    Qian T.
    Peng Z.
    Wang X.
    Li X.
    Li J.
    Data Analysis and Knowledge Discovery, 2024, 8 (05) : 59 - 67