Removing Backdoors in Pre-trained Models by Regularized Continual Pre-training

被引：0

作者：

Zhu, Biru ^{[1
]}

Cui, Ganqu ^{[2
]}

Chen, Yangyi ^{[3
]}

Qin, Yujia ^{[2
]}

Yuan, Lifan ^{[2
]}

Fu, Chong ^{[4
]}

Deng, Yangdong ^{[1
]}

Liu, Zhiyuan ^{[2
]}

Sun, Maosong ^{[2
]}

Gu, Ming ^{[1
]}

机构：

[1] Tsinghua Univ, Sch Software, Beijing, Peoples R China

[2] Tsinghua Univ, Dept Comp Sci & Technol, Beijing, Peoples R China

[3] Univ Illinois, Champaign, IL USA

[4] Zhejiang Univ, Zhejiang, Peoples R China

来源：

TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS | 2023年 / 11卷

基金：

国家重点研发计划;

关键词：

Compendex;

D O I：

10.1162/tacl_a_00622

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recent research has revealed that pre-trained models (PTMs) are vulnerable to backdoor attacks before the fine-tuning stage. The attackers can implant transferable task-agnostic backdoors in PTMs, and control model outputs on any downstream task, which poses severe security threats to all downstream applications. Existing backdoor-removal defenses focus on task-specific classification models and they are not suitable for defending PTMs against task-agnostic backdoor attacks. To this end, we propose the first task-agnostic backdoor removal method for PTMs. Based on the selective activation phenomenon in backdoored PTMs, we design a simple and effective backdoor eraser, which continually pre-trains the backdoored PTMs with a regularization term in an end-to-end approach. The regularization term removes backdoor functionalities from PTMs while the continual pre-training maintains the normal functionalities of PTMs. We conduct extensive experiments on pre-trained models across different modalities and architectures. The experimental results show that our method can effectively remove backdoors inside PTMs and preserve benign functionalities of PTMs with a few downstream-task-irrelevant auxiliary data, e.g., unlabeled plain texts. The average attack success rate on three downstream datasets is reduced from 99.88% to 8.10% after our defense on the backdoored BERT. The codes are publicly available at https://github.com/thunlp/RECIPE.

引用

页码：1608 / 1623

页数：16

共 50 条

[1] Continual Learning with Pre-Trained Models: A Survey
Zhou, Da-Wei
Sun, Hai-Long
Ning, Jingyi
Ye, Han-Jia
Zhan, De-Chuan
PROCEEDINGS OF THE THIRTY-THIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2024, 2024, : 8363 - 8371
[2] Detecting Backdoors in Pre-trained Encoders
Feng, Shiwei
Tao, Guanhong
Cheng, Siyuan
Shen, Guangyu
Xu, Xiangzhe
Liu, Yingqi
Zhang, Kaiyuan
Ma, Shiqing
Zhang, Xiangyu
arXiv, 2023,
[3] Detecting Backdoors in Pre-trained Encoders
Feng, Shiwei
Tao, Guanhong
Cheng, Siyuan
Shen, Guangyu
Xu, Xiangzhe
Liu, Yingqi
Zhang, Kaiyuan
Ma, Shiqing
Zhang, Xiangyu
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 16352 - 16362
[4] RanPAC: Random Projections and Pre-trained Models for Continual Learning
McDonnell, Mark D.
Gong, Dong
Parveneh, Amin
Abbasnejad, Ehsan
van den Hengel, Anton
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[5] Do Pre-trained Models Benefit Equally in Continual Learning?
Lee, Kuan-Ying
Zhong, Yuanyi
Wang, Yu-Xiong
2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 6474 - 6482
[6] Continual knowledge infusion into pre-trained biomedical language models
Jha, Kishlay
Zhang, Aidong
BIOINFORMATICS, 2022, 38 (02) : 494 - 502
[7] Recyclable Tuning for Continual Pre-training
Qin, Yujia
Qian, Cheng
Han, Xu
Lin, Yankai
Wang, Huadong
Xie, Ruobing
Li, Zhiyuan
Sun, Maosong
Zhou, Jie
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 11403 - 11426
[8] Preserving Cross-Linguality of Pre-trained Models via Continual Learning
Liu, Zihan
Winata, Genta Indra
Madotto, Andrea
Fung, Pascale
REPL4NLP 2021: PROCEEDINGS OF THE 6TH WORKSHOP ON REPRESENTATION LEARNING FOR NLP, 2021, : 64 - 71
[9] Fine-tuning Pre-trained Language Models for Few-shot Intent Detection: Supervised Pre-training and Isotropization
Zhang, Haode
Liang, Haowen
Zhang, Yuwei
Zhan, Liming
Wu, Xiao-Ming
Lu, Xiaolei
Lam, Albert Y. S.
NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 532 - 542
[10] The Impact of Training Methods on the Development of Pre-Trained Language Models
Uribe, Diego
Cuan, Enrique
Urquizo, Elisa
COMPUTACION Y SISTEMAS, 2024, 28 (01): : 109 - 124

← 1 2 3 4 5 →