UOR: Universal Backdoor Attacks on Pre-trained Language Models

被引：0

作者：

Du, Wei ^{[1
]}

Li, Peixuan ^{[1
]}

Zhao, Haodong ^{[1
]}

Ju, Tianjie ^{[1
]}

Ren, Ge ^{[1
]}

Liu, Gongshen ^{[1
]}

机构：

[1] Shanghai Jiao Tong Univ, Sch Cyber Sci & Engn, Shanghai, Peoples R China

来源：

FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024 | 2024年

基金：

国家重点研发计划; 中国国家自然科学基金;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Task-agnostic and transferable backdoors implanted in pre-trained language models (PLMs) pose a severe security threat as they can be inherited to any downstream task. However, existing methods rely on manual selection of triggers and backdoor representations, hindering their effectiveness and universality across different PLMs or usage paradigms. In this paper, we propose a new backdoor attack method called UOR, which overcomes these limitations by turning manual selection into automatic optimization. Specifically, we design poisoned supervised contrastive learning, which can automatically learn more uniform and universal backdoor representations. This allows for more even coverage of the output space, thus hitting more labels in downstream tasks after fine-tuning. Furthermore, we utilize gradient search to select appropriate trigger words that can be adapted to different PLMs and vocabularies. Experiments show that UOR achieves better attack performance on various text classification tasks compared to manual methods. Moreover, we test on PLMs with different architectures, usage paradigms, and more challenging tasks, achieving higher scores for universality.

引用

页码：7865 / 7877

页数：13

共 50 条

[1] Aliasing Backdoor Attacks on Pre-trained Models
Wei, Cheng'an
Lee, Yeonjoon
Chen, Kai
Meng, Guozhu
Lv, Peizhuo
PROCEEDINGS OF THE 32ND USENIX SECURITY SYMPOSIUM, 2023, : 2707 - 2724
[2] Backdoor Attacks on Pre-trained Models by Layerwise Weight Poisoning
Li, Linyang
Song, Demin
Li, Xiaonan
Zeng, Jiehang
Ma, Ruotian
Qiu, Xipeng
2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 3023 - 3032
[3] Red Alarm for Pre-trained Models: Universal Vulnerability to Neuron-level Backdoor Attacks
Zhang, Zhengyan
Xiao, Guangxuan
Li, Yongwei
Lv, Tian
Qi, Fanchao
Liu, Zhiyuan
Wang, Yasheng
Jiang, Xin
Sun, Maosong
MACHINE INTELLIGENCE RESEARCH, 2023, 20 (02) : 180 - 193
[4] CBAs: Character-level Backdoor Attacks against Chinese Pre-trained Language Models
He, Xinyu
Hao, Fengrui
Gu, Tianlong
Chang, Liang
ACM TRANSACTIONS ON PRIVACY AND SECURITY, 2024, 27 (03)
[5] Defending Pre-trained Language Models as Few-shot Learners against Backdoor Attacks
Xi, Zhaohan
Du, Tianyu
Li, Changjiang
Pang, Ren
Ji, Shouling
Chen, Jinghui
Ma, Fenglong
Wang, Ting
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[6] Maximum Entropy Loss, the Silver Bullet Targeting Backdoor Attacks in Pre-trained Language Models
Liu, Zhengxiao
Shen, Bowen
Lin, Zheng
Wang, Fali
Wang, Weiping
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 3850 - 3868
[7] Multi-target Backdoor Attacks for Code Pre-trained Models
Li, Yanzhou
Liu, Shangqing
Chen, Kangjie
Xie, Xiaofei
Zhang, Tianwei
Liu, Yang
PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 7236 - 7254
[8] PPT: Backdoor Attacks on Pre-trained Models via Poisoned Prompt Tuning
Du, Wei
Zhao, Yichun
Li, Boqun
Liu, Gongshen
Wang, Shilin
PROCEEDINGS OF THE THIRTY-FIRST INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2022, 2022, : 680 - 686
[9] Backdoor Attacks Against Transfer Learning With Pre-Trained Deep Learning Models
Wang, Shuo
Nepal, Surya
Rudolph, Carsten
Grobler, Marthie
Chen, Shangyu
Chen, Tianle
IEEE TRANSACTIONS ON SERVICES COMPUTING, 2022, 15 (03) : 1526 - 1539
[10] Moderate-fitting as a Natural Backdoor Defender for Pre-trained Language Models
Zhu, Biru
Qin, Yujia
Cui, Ganqu
Chen, Yangyi
Zhao, Weilin
Fu, Chong
Deng, Yangdong
Liu, Zhiyuan
Wang, Jingang
Wu, Wei
Sun, Maosong
Gu, Ming
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,

← 1 2 3 4 5 →