Modeling Adversarial Attack on Pre-trained Language Models as Sequential Decision Making

被引：0

作者：

Fang, Xuanjie ^{[1
]}

Cheng, Sijie ^{[1
,2
,3
,4
]}

Liu, Yang ^{[2
,3
,4
,5
]}

Wang, Wei ^{[1
]}

机构：

[1] Fudan Univ, Sch Comp Sci, Shanghai, Peoples R China

[2] Tsinghua Univ, Inst AI, Dept Comp Sci & Tech, Beijing, Peoples R China

[3] Tsinghua Univ, Inst AI Ind Res AIR, Beijing, Peoples R China

[4] Beijing Natl Res Ctr Informat Sci & Technol, Beijing, Peoples R China

[5] Shanghai Artificial Intelligence Lab, Shanghai, Peoples R China

来源：

FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023) | 2023年

基金：

中国国家自然科学基金; 国家重点研发计划;

关键词：

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Pre-trained language models (PLMs) have been widely used to underpin various downstream tasks. However, the adversarial attack task has found that PLMs are vulnerable to small perturbations. Mainstream methods adopt a detached two-stage framework to attack without considering the subsequent influence of substitution at each step. In this paper, we formally model the adversarial attack task on PLMs as a sequential decision-making problem, where the whole attack process is sequential with two decision-making problems, i.e., word finder and word substitution. Considering the attack process can only receive the final state without any direct intermediate signals, we propose to use reinforcement learning to find an appropriate sequential attack path to generate adversaries, named SDM-ATTACK. Extensive experimental results show that SDM-ATTACK achieves the highest attack success rate with a comparable modification rate and semantic similarity to attack fine-tuned BERT. Furthermore, our analyses demonstrate the generalization and transferability of SDM-ATTACK. The code is available at https://github. com/fduxuan/SDM-Attack.

引用

页码：7322 / 7336

页数：15

共 50 条

[1] Pre-Trained Language Models for Interactive Decision-Making
Li, Shuang
Puig, Xavier
Paxton, Chris
Du, Yilun
Wang, Clinton
Fan, Linxi
Chen, Tao
Huang, De-An
Akyurek, Ekin
Anandkumar, Anima
Andreas, Jacob
Mordatch, Igor
Torralba, Antonio
Zhu, Yuke
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[2] Rethinking Textual Adversarial Defense for Pre-Trained Language Models
Wang, Jiayi
Bao, Rongzhou
Zhang, Zhuosheng
Zhao, Hai
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 2526 - 2540
[3] An Extensive Study on Adversarial Attack against Pre-trained Models of Code
Du, Xiaohu
Wen, Ming
Wei, Zichao
Wang, Shangwen
Jin, Hai
PROCEEDINGS OF THE 31ST ACM JOINT MEETING EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, ESEC/FSE 2023, 2023, : 489 - 501
[4] Universal Adversarial Perturbations for Vision-Language Pre-trained Models
Zhang, Peng-Fei
Huang, Zi
Bai, Guangdong
PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 862 - 871
[5] Modeling Second Language Acquisition with pre-trained neural language models
Palenzuela, Alvaro J. Jimenez
Frasincar, Flavius
Trusca, Maria Mihaela
EXPERT SYSTEMS WITH APPLICATIONS, 2022, 207
[6] Modeling Content Importance for Summarization with Pre-trained Language Models
Xiao, Liqiang
Lu Wang
Hao He
Jin, Yaohui
PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 3606 - 3611
[7] Pre-Trained Language Models and Their Applications
Wang, Haifeng
Li, Jiwei
Wu, Hua
Hovy, Eduard
Sun, Yu
ENGINEERING, 2023, 25 : 51 - 65
[8] Natural Attack for Pre-trained Models of Code
Yang, Zhou
Shi, Jieke
He, Junda
Lo, David
2022 ACM/IEEE 44TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2022), 2022, : 1482 - 1493
[9] CodeAttack: Code-Based Adversarial Attacks for Pre-trained Programming Language Models
Jha, Akshita
Reddy, Chandan K.
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 12, 2023, : 14892 - 14900
[10] Annotating Columns with Pre-trained Language Models
Suhara, Yoshihiko
Li, Jinfeng
Li, Yuliang
Zhang, Dan
Demiralp, Cagatay
Chen, Chen
Tan, Wang-Chiew
PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA (SIGMOD '22), 2022, : 1493 - 1503

← 1 2 3 4 5 →