Modeling Adversarial Attack on Pre-trained Language Models as Sequential Decision Making

被引:0
|
作者
Fang, Xuanjie [1 ]
Cheng, Sijie [1 ,2 ,3 ,4 ]
Liu, Yang [2 ,3 ,4 ,5 ]
Wang, Wei [1 ]
机构
[1] Fudan Univ, Sch Comp Sci, Shanghai, Peoples R China
[2] Tsinghua Univ, Inst AI, Dept Comp Sci & Tech, Beijing, Peoples R China
[3] Tsinghua Univ, Inst AI Ind Res AIR, Beijing, Peoples R China
[4] Beijing Natl Res Ctr Informat Sci & Technol, Beijing, Peoples R China
[5] Shanghai Artificial Intelligence Lab, Shanghai, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
D O I
暂无
中图分类号
学科分类号
摘要
Pre-trained language models (PLMs) have been widely used to underpin various downstream tasks. However, the adversarial attack task has found that PLMs are vulnerable to small perturbations. Mainstream methods adopt a detached two-stage framework to attack without considering the subsequent influence of substitution at each step. In this paper, we formally model the adversarial attack task on PLMs as a sequential decision-making problem, where the whole attack process is sequential with two decision-making problems, i.e., word finder and word substitution. Considering the attack process can only receive the final state without any direct intermediate signals, we propose to use reinforcement learning to find an appropriate sequential attack path to generate adversaries, named SDM-ATTACK. Extensive experimental results show that SDM-ATTACK achieves the highest attack success rate with a comparable modification rate and semantic similarity to attack fine-tuned BERT. Furthermore, our analyses demonstrate the generalization and transferability of SDM-ATTACK. The code is available at https://github. com/fduxuan/SDM-Attack.
引用
收藏
页码:7322 / 7336
页数:15
相关论文
共 50 条
  • [1] Pre-Trained Language Models for Interactive Decision-Making
    Li, Shuang
    Puig, Xavier
    Paxton, Chris
    Du, Yilun
    Wang, Clinton
    Fan, Linxi
    Chen, Tao
    Huang, De-An
    Akyurek, Ekin
    Anandkumar, Anima
    Andreas, Jacob
    Mordatch, Igor
    Torralba, Antonio
    Zhu, Yuke
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [2] Rethinking Textual Adversarial Defense for Pre-Trained Language Models
    Wang, Jiayi
    Bao, Rongzhou
    Zhang, Zhuosheng
    Zhao, Hai
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 2526 - 2540
  • [3] An Extensive Study on Adversarial Attack against Pre-trained Models of Code
    Du, Xiaohu
    Wen, Ming
    Wei, Zichao
    Wang, Shangwen
    Jin, Hai
    PROCEEDINGS OF THE 31ST ACM JOINT MEETING EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, ESEC/FSE 2023, 2023, : 489 - 501
  • [4] Universal Adversarial Perturbations for Vision-Language Pre-trained Models
    Zhang, Peng-Fei
    Huang, Zi
    Bai, Guangdong
    PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 862 - 871
  • [5] Modeling Second Language Acquisition with pre-trained neural language models
    Palenzuela, Alvaro J. Jimenez
    Frasincar, Flavius
    Trusca, Maria Mihaela
    EXPERT SYSTEMS WITH APPLICATIONS, 2022, 207
  • [6] Modeling Content Importance for Summarization with Pre-trained Language Models
    Xiao, Liqiang
    Lu Wang
    Hao He
    Jin, Yaohui
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 3606 - 3611
  • [7] Pre-Trained Language Models and Their Applications
    Wang, Haifeng
    Li, Jiwei
    Wu, Hua
    Hovy, Eduard
    Sun, Yu
    ENGINEERING, 2023, 25 : 51 - 65
  • [8] Natural Attack for Pre-trained Models of Code
    Yang, Zhou
    Shi, Jieke
    He, Junda
    Lo, David
    2022 ACM/IEEE 44TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2022), 2022, : 1482 - 1493
  • [9] CodeAttack: Code-Based Adversarial Attacks for Pre-trained Programming Language Models
    Jha, Akshita
    Reddy, Chandan K.
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 12, 2023, : 14892 - 14900
  • [10] Annotating Columns with Pre-trained Language Models
    Suhara, Yoshihiko
    Li, Jinfeng
    Li, Yuliang
    Zhang, Dan
    Demiralp, Cagatay
    Chen, Chen
    Tan, Wang-Chiew
    PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA (SIGMOD '22), 2022, : 1493 - 1503