CodeAttack: Code-Based Adversarial Attacks for Pre-trained Programming Language Models

被引:0
|
作者
Jha, Akshita [1 ]
Reddy, Chandan K. [1 ]
机构
[1] Virginia Tech, Dept Comp Sci, Arlington, VA 22203 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Pre-trained programming language (PL) models (such as CodeT5, CodeBERT, GraphCodeBERT, etc.,) have the potential to automate software engineering tasks involving code understanding and code generation. However, these models operate in the natural channel of code, i.e., they are primarily concerned with the human understanding of the code. They are not robust to changes in the input and thus, are potentially susceptible to adversarial attacks in the natural channel. We propose, CodeAttack, a simple yet effective blackbox attack model that uses code structure to generate effective, efficient, and imperceptible adversarial code samples and demonstrates the vulnerabilities of the state-of-the-art PL models to code-specific adversarial attacks. We evaluate the transferability of CodeAttack on several code-code (translation and repair) and code-NL (summarization) tasks across different programming languages. CodeAttack outperforms state-of-the-art adversarial NLP attack models to achieve the best overall drop in performance while being more efficient, imperceptible, consistent, and fluent. The code can be found at https://github.com/reddy-lab-code-research/CodeAttack.
引用
收藏
页码:14892 / 14900
页数:9
相关论文
共 50 条
  • [21] Extending Source Code Pre-Trained Language Models to Summarise Decompiled Binaries
    Al-Kaswan, Ali
    Ahmed, Toufique
    Izadi, Maliheh
    Sawant, Anand Ashok
    Devanbu, Premkumar
    van Deursen, Arie
    2023 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION AND REENGINEERING, SANER, 2023, : 260 - 271
  • [22] Exploring the Potential of Pre-Trained Language Models of Code for Automated Program Repair
    Hao, Sichong
    Shi, Xianjun
    Liu, Hongwei
    ELECTRONICS, 2024, 13 (07)
  • [23] A Data Cartography based MixUp for Pre-trained Language Models
    Park, Seo Yeon
    Caragea, Cornelia
    NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 4244 - 4250
  • [24] Model-Agnostic Syntactical Information for Pre-Trained Programming Language Models
    Saberi, Iman
    Fard, Fatemeh H.
    2023 IEEE/ACM 20TH INTERNATIONAL CONFERENCE ON MINING SOFTWARE REPOSITORIES, MSR, 2023, : 183 - 193
  • [25] On the Transferability of Pre-trained Language Models for Low-Resource Programming Languages
    Chen, Fuxiang
    Fard, Fatemeh H.
    Lo, David
    Bryksin, Timofey
    30TH IEEE/ACM INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION (ICPC 2022), 2022, : 401 - 412
  • [26] Pre-trained transformer-based language models for Sundanese
    Wilson Wongso
    Henry Lucky
    Derwin Suhartono
    Journal of Big Data, 9
  • [27] Pre-trained transformer-based language models for Sundanese
    Wongso, Wilson
    Lucky, Henry
    Suhartono, Derwin
    JOURNAL OF BIG DATA, 2022, 9 (01)
  • [28] Annotating Columns with Pre-trained Language Models
    Suhara, Yoshihiko
    Li, Jinfeng
    Li, Yuliang
    Zhang, Dan
    Demiralp, Cagatay
    Chen, Chen
    Tan, Wang-Chiew
    PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA (SIGMOD '22), 2022, : 1493 - 1503
  • [29] Pre-trained Adversarial Perturbations
    Ban, Yuanhao
    Dong, Yinpeng
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [30] LaoPLM: Pre-trained Language Models for Lao
    Lin, Nankai
    Fu, Yingwen
    Yang, Ziyu
    Chen, Chuwei
    Jiang, Shengyi
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6506 - 6512