Modeling Adversarial Attack on Pre-trained Language Models as Sequential Decision Making

被引：0

作者：

Fang, Xuanjie ^{[1
]}

Cheng, Sijie ^{[1
,2
,3
,4
]}

Liu, Yang ^{[2
,3
,4
,5
]}

Wang, Wei ^{[1
]}

机构：

[1] Fudan Univ, Sch Comp Sci, Shanghai, Peoples R China

[2] Tsinghua Univ, Inst AI, Dept Comp Sci & Tech, Beijing, Peoples R China

[3] Tsinghua Univ, Inst AI Ind Res AIR, Beijing, Peoples R China

[4] Beijing Natl Res Ctr Informat Sci & Technol, Beijing, Peoples R China

[5] Shanghai Artificial Intelligence Lab, Shanghai, Peoples R China

来源：

FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023) | 2023年

基金：

国家重点研发计划; 中国国家自然科学基金;

关键词：

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Pre-trained language models (PLMs) have been widely used to underpin various downstream tasks. However, the adversarial attack task has found that PLMs are vulnerable to small perturbations. Mainstream methods adopt a detached two-stage framework to attack without considering the subsequent influence of substitution at each step. In this paper, we formally model the adversarial attack task on PLMs as a sequential decision-making problem, where the whole attack process is sequential with two decision-making problems, i.e., word finder and word substitution. Considering the attack process can only receive the final state without any direct intermediate signals, we propose to use reinforcement learning to find an appropriate sequential attack path to generate adversaries, named SDM-ATTACK. Extensive experimental results show that SDM-ATTACK achieves the highest attack success rate with a comparable modification rate and semantic similarity to attack fine-tuned BERT. Furthermore, our analyses demonstrate the generalization and transferability of SDM-ATTACK. The code is available at https://github. com/fduxuan/SDM-Attack.

引用

页码：7322 / 7336

页数：15

共 50 条

[21] Pre-trained language models in medicine: A survey *
Luo, Xudong
Deng, Zhiqi
Yang, Binxia
Luo, Michael Y.
ARTIFICIAL INTELLIGENCE IN MEDICINE, 2024, 154
[22] Making Pre-trained Language Models Better Few-shot Learners
Gao, Tianyu
Fisch, Adam
Chen, Danqi
59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, 2021, : 3816 - 3830
[23] CodeBERT-Attack: Adversarial attack against source code deep learning models via pre-trained model
Zhang, Huangzhao
Lu, Shuai
Li, Zhuo
Jin, Zhi
Ma, Lei
Liu, Yang
Li, Ge
JOURNAL OF SOFTWARE-EVOLUTION AND PROCESS, 2024, 36 (03)
[24] Recommending metamodel concepts during modeling activities with pre-trained language models
Martin Weyssow
Houari Sahraoui
Eugene Syriani
Software and Systems Modeling, 2022, 21 : 1071 - 1089
[25] Unveiling Hidden Variables in Adversarial Attack Transferability on Pre-Trained Models for COVID-19 Diagnosis
Akhtom, Dua'a
Singh, Manmeet Mahinderjit
Xinying, Chew
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2024, 15 (11) : 1343 - 1350
[26] Recommending metamodel concepts during modeling activities with pre-trained language models
Weyssow, Martin
Sahraoui, Houari
Syriani, Eugene
SOFTWARE AND SYSTEMS MODELING, 2022, 21 (03): : 1071 - 1089
[27] ModelMate: A recommender for textual modeling languages based on pre-trained language models
Dura Costa, Carlos
Lopez, Jose Antonio Hernandez
Sanchez Cuadrado, Jesus
27TH INTERNATIONAL ACM/IEEE CONFERENCE ON MODEL DRIVEN ENGINEERING LANGUAGES AND SYSTEMS, MODELS, 2024, : 183 - 194
[28] VLATTACK: Multimodal Adversarial Attacks on Vision-Language Tasks via Pre-trained Models
Yin, Ziyi
Ye, Muchao
Zhang, Tianrong
Du, Tianyu
Zhu, Jinguo
Liu, Han
Chen, Jinghui
Wang, Ting
Ma, Fenglong
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[29] G-Tuning: Improving Generalization of Pre-trained Language Models with Generative Adversarial Network
Weng, Rongxiang
Cheng, Wensen
Zhang, Min
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 4747 - 4755
[30] A Study of Pre-trained Language Models in Natural Language Processing
Duan, Jiajia
Zhao, Hui
Zhou, Qian
Qiu, Meikang
Liu, Meiqin
2020 IEEE INTERNATIONAL CONFERENCE ON SMART CLOUD (SMARTCLOUD 2020), 2020, : 116 - 121

← 1 2 3 4 5 →