Data Stealing Attacks against Large Language Models via Backdooring

被引：1

作者：

He, Jiaming ^{[1
]}

Hou, Guanyu ^{[1
]}

Jia, Xinyue ^{[1
]}

Chen, Yangyang ^{[1
]}

Liao, Wenqi ^{[1
]}

Zhou, Yinhang ^{[2
]}

Zhou, Rang ^{[1
]}

机构：

[1] Chengdu Univ Technol, Coll Comp Sci & Cyber Secur, Oxford Brookes Coll, Chengdu 610059, Peoples R China

[2] Shenyang Normal Univ, Software Coll, Shenyang 110034, Peoples R China

来源：

ELECTRONICS | 2024年 / 13卷 / 14期

关键词：

data privacy; large language models; stealing attacks;

D O I：

10.3390/electronics13142858

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Large language models (LLMs) have gained immense attention and are being increasingly applied in various domains. However, this technological leap forward poses serious security and privacy concerns. This paper explores a novel approach to data stealing attacks by introducing an adaptive method to extract private training data from pre-trained LLMs via backdooring. Our method mainly focuses on the scenario of model customization and is conducted in two phases, including backdoor training and backdoor activation, which allow for the extraction of private information without prior knowledge of the model's architecture or training data. During the model customization stage, attackers inject the backdoor into the pre-trained LLM by poisoning a small ratio of the training dataset. During the inference stage, attackers can extract private information from the third-party knowledge database by incorporating the pre-defined backdoor trigger. Our method leverages the customization process of LLMs, injecting a stealthy backdoor that can be triggered after deployment to retrieve private data. We demonstrate the effectiveness of our proposed attack through extensive experiments, achieving a notable attack success rate. Extensive experiments demonstrate the effectiveness of our stealing attack in popular LLM architectures, as well as stealthiness during normal inference.

引用

页数：19

共 50 条

[31] HARNESSING TASK OVERLOAD FOR SCALABLE JAILBREAK ATTACKS ON LARGE LANGUAGE MODELS
Dong, Yiting
Shen, Guobin
Zhao, Dongcheng
He, Xiang
Zeng, Yi
arXiv,
[32] Adversarial Attacks and Defenses in Large Language Models: Old and New Threats
Schwinn, Leo
Dobre, David
Guennemann, Stephan
Gidel, Gauthier
PROCEEDINGS ON I CAN'T BELIEVE IT'S NOT BETTER: FAILURE MODES IN THE AGE OF FOUNDATION MODELS AT NEURIPS 2023 WORKSHOPS, 2023, 239 : 103 - 117
[33] Demystifying Data Management for Large Language Models
Miao, Xupeng
Jia, Zhihao
Cui, Bin
COMPANION OF THE 2024 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, SIGMOD-COMPANION 2024, 2024, : 547 - 555
[34] Lookin' Out My Backdoor! Investigating Backdooring Attacks Against DL-driven Malware Detectors
D'Onghia, Mario
Di Cesare, Federico
Gallo, Luigi
Carminati, Michele
Polino, Mario
Zanero, Stefano
PROCEEDINGS OF THE 16TH ACM WORKSHOP ON ARTIFICIAL INTELLIGENCE AND SECURITY, AISEC 2023, 2023, : 209 - 220
[35] Adversarial Attacks Against Deep Generative Models on Data: A Survey
Sun, Hui
Zhu, Tianqing
Zhang, Zhiqiu
Jin, Dawei
Xiong, Ping
Zhou, Wanlei
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (04) : 3367 - 3388
[36] Data Poisoning Attacks Against Outcome Interpretations of Predictive Models
Zhang, Hengtong
Gao, Jing
Su, Lu
KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2021, : 2165 - 2173
[37] Large Language Models Are Better Adversaries: Exploring Generative Clean-Label Backdoor Attacks Against Text Classifiers
You, Wencong
Hammoudeh, Zayd
Lowd, Daniel
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 12499 - 12527
[38] UnSplit: Data-Oblivious Model Inversion, Model Stealing, and Label Inference Attacks Against Split Learning
Erdogan, Ege
Kupcu, Alptekin
Cicek, A. Ercument
PROCEEDINGS OF THE 21ST WORKSHOP ON PRIVACY IN THE ELECTRONIC SOCIETY, WPES 2022, 2022, : 115 - 124
[39] Stealing Machine Learning Models: Attacks and Countermeasures for Generative Adversarial Networks
Hu, Hailong
Pang, Jun
37TH ANNUAL COMPUTER SECURITY APPLICATIONS CONFERENCE, ACSAC 2021, 2021, : 1 - 16
[40] Stealing Machine Learning Parameters via Side Channel Power Attacks
Wolf, Shaya
Hu, Hui
Cooley, Rafer
Borowczak, Mike
2021 IEEE COMPUTER SOCIETY ANNUAL SYMPOSIUM ON VLSI (ISVLSI 2021), 2021, : 242 - 247

← 1 2 3 4 5 →