Modeling Adversarial Attack on Pre-trained Language Models as Sequential Decision Making

被引:0
|
作者
Fang, Xuanjie [1 ]
Cheng, Sijie [1 ,2 ,3 ,4 ]
Liu, Yang [2 ,3 ,4 ,5 ]
Wang, Wei [1 ]
机构
[1] Fudan Univ, Sch Comp Sci, Shanghai, Peoples R China
[2] Tsinghua Univ, Inst AI, Dept Comp Sci & Tech, Beijing, Peoples R China
[3] Tsinghua Univ, Inst AI Ind Res AIR, Beijing, Peoples R China
[4] Beijing Natl Res Ctr Informat Sci & Technol, Beijing, Peoples R China
[5] Shanghai Artificial Intelligence Lab, Shanghai, Peoples R China
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
学科分类号
摘要
Pre-trained language models (PLMs) have been widely used to underpin various downstream tasks. However, the adversarial attack task has found that PLMs are vulnerable to small perturbations. Mainstream methods adopt a detached two-stage framework to attack without considering the subsequent influence of substitution at each step. In this paper, we formally model the adversarial attack task on PLMs as a sequential decision-making problem, where the whole attack process is sequential with two decision-making problems, i.e., word finder and word substitution. Considering the attack process can only receive the final state without any direct intermediate signals, we propose to use reinforcement learning to find an appropriate sequential attack path to generate adversaries, named SDM-ATTACK. Extensive experimental results show that SDM-ATTACK achieves the highest attack success rate with a comparable modification rate and semantic similarity to attack fine-tuned BERT. Furthermore, our analyses demonstrate the generalization and transferability of SDM-ATTACK. The code is available at https://github. com/fduxuan/SDM-Attack.
引用
收藏
页码:7322 / 7336
页数:15
相关论文
共 50 条
  • [21] Pre-trained language models in medicine: A survey *
    Luo, Xudong
    Deng, Zhiqi
    Yang, Binxia
    Luo, Michael Y.
    ARTIFICIAL INTELLIGENCE IN MEDICINE, 2024, 154
  • [22] Making Pre-trained Language Models Better Few-shot Learners
    Gao, Tianyu
    Fisch, Adam
    Chen, Danqi
    59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, 2021, : 3816 - 3830
  • [23] CodeBERT-Attack: Adversarial attack against source code deep learning models via pre-trained model
    Zhang, Huangzhao
    Lu, Shuai
    Li, Zhuo
    Jin, Zhi
    Ma, Lei
    Liu, Yang
    Li, Ge
    JOURNAL OF SOFTWARE-EVOLUTION AND PROCESS, 2024, 36 (03)
  • [24] Recommending metamodel concepts during modeling activities with pre-trained language models
    Martin Weyssow
    Houari Sahraoui
    Eugene Syriani
    Software and Systems Modeling, 2022, 21 : 1071 - 1089
  • [25] Unveiling Hidden Variables in Adversarial Attack Transferability on Pre-Trained Models for COVID-19 Diagnosis
    Akhtom, Dua'a
    Singh, Manmeet Mahinderjit
    Xinying, Chew
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2024, 15 (11) : 1343 - 1350
  • [26] Recommending metamodel concepts during modeling activities with pre-trained language models
    Weyssow, Martin
    Sahraoui, Houari
    Syriani, Eugene
    SOFTWARE AND SYSTEMS MODELING, 2022, 21 (03): : 1071 - 1089
  • [27] ModelMate: A recommender for textual modeling languages based on pre-trained language models
    Dura Costa, Carlos
    Lopez, Jose Antonio Hernandez
    Sanchez Cuadrado, Jesus
    27TH INTERNATIONAL ACM/IEEE CONFERENCE ON MODEL DRIVEN ENGINEERING LANGUAGES AND SYSTEMS, MODELS, 2024, : 183 - 194
  • [28] VLATTACK: Multimodal Adversarial Attacks on Vision-Language Tasks via Pre-trained Models
    Yin, Ziyi
    Ye, Muchao
    Zhang, Tianrong
    Du, Tianyu
    Zhu, Jinguo
    Liu, Han
    Chen, Jinghui
    Wang, Ting
    Ma, Fenglong
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [29] G-Tuning: Improving Generalization of Pre-trained Language Models with Generative Adversarial Network
    Weng, Rongxiang
    Cheng, Wensen
    Zhang, Min
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 4747 - 4755
  • [30] A Study of Pre-trained Language Models in Natural Language Processing
    Duan, Jiajia
    Zhao, Hui
    Zhou, Qian
    Qiu, Meikang
    Liu, Meiqin
    2020 IEEE INTERNATIONAL CONFERENCE ON SMART CLOUD (SMARTCLOUD 2020), 2020, : 116 - 121