Data Stealing Attacks against Large Language Models via Backdooring

被引：1

作者：

He, Jiaming ^{[1
]}

Hou, Guanyu ^{[1
]}

Jia, Xinyue ^{[1
]}

Chen, Yangyang ^{[1
]}

Liao, Wenqi ^{[1
]}

Zhou, Yinhang ^{[2
]}

Zhou, Rang ^{[1
]}

机构：

[1] Chengdu Univ Technol, Coll Comp Sci & Cyber Secur, Oxford Brookes Coll, Chengdu 610059, Peoples R China

[2] Shenyang Normal Univ, Software Coll, Shenyang 110034, Peoples R China

来源：

ELECTRONICS | 2024年 / 13卷 / 14期

关键词：

data privacy; large language models; stealing attacks;

D O I：

10.3390/electronics13142858

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Large language models (LLMs) have gained immense attention and are being increasingly applied in various domains. However, this technological leap forward poses serious security and privacy concerns. This paper explores a novel approach to data stealing attacks by introducing an adaptive method to extract private training data from pre-trained LLMs via backdooring. Our method mainly focuses on the scenario of model customization and is conducted in two phases, including backdoor training and backdoor activation, which allow for the extraction of private information without prior knowledge of the model's architecture or training data. During the model customization stage, attackers inject the backdoor into the pre-trained LLM by poisoning a small ratio of the training dataset. During the inference stage, attackers can extract private information from the third-party knowledge database by incorporating the pre-defined backdoor trigger. Our method leverages the customization process of LLMs, injecting a stealthy backdoor that can be triggered after deployment to retrieve private data. We demonstrate the effectiveness of our proposed attack through extensive experiments, achieving a notable attack success rate. Extensive experiments demonstrate the effectiveness of our stealing attack in popular LLM architectures, as well as stealthiness during normal inference.

引用

页数：19

共 50 条

[1] Defending Large Language Models Against Jailbreak Attacks via Layer-specific Editing
Zhao, Wei
Li, Zhe
Li, Yige
Zhang, Ye
Sun, Jun
arXiv,
[2] Defending Large Language Models Against Jailbreak Attacks via Layer-specific Editing
Zhao, Wei
Li, Zhe
Li, Yige
Zhang, Ye
Sun, Jun
EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Findings of EMNLP 2024, 2024, : 5094 - 5109
[3] JailbreakLens: Visual Analysis of Jailbreak Attacks Against Large Language Models
Feng, Yingchaojie
Chen, Zhizhang
Kang, Zhining
Wang, Sijia
Zhu, Minfeng
Zhang, Wei
Chen, Wei
arXiv,
[4] Membership Inference Attacks against Language Models via Neighbourhood Comparison
Mattern, Justus
Mireshghallah, Fatemehsadat
Jin, Zhijing
Schoelkopf, Bernhard
Sachan, Mrinmaya
Berg-Kirkpatrick, Taylor
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 11330 - 11343
[5] Prompt Stealing Attacks Against Text-to-Image Generation Models
Shen, Xinyue
Qu, Yiting
Backes, Michael
Zhang, Yang
PROCEEDINGS OF THE 33RD USENIX SECURITY SYMPOSIUM, SECURITY 2024, 2024, : 5823 - 5840
[6] Adversarial Attacks on Large Language Models
Zou, Jing
Zhang, Shungeng
Qiu, Meikang
KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT IV, KSEM 2024, 2024, 14887 : 85 - 96
[7] Medical large language models are vulnerable to data-poisoning attacks
Alber, Daniel Alexander
Yang, Zihao
Alyakin, Anton
Yang, Eunice
Rai, Sumedha
Valliani, Aly A.
Zhang, Jeff
Rosenbaum, Gabriel R.
Amend-Thomas, Ashley K.
Kurland, David B.
Kremer, Caroline M.
Eremiev, Alexander
Negash, Bruck
Wiggan, Daniel D.
Nakatsuka, Michelle A.
Sangwon, Karl L.
Neifert, Sean N.
Khan, Hammad A.
Save, Akshay Vinod
Palla, Adhith
Grin, Eric A.
Hedman, Monika
Nasir-Moin, Mustafa
Liu, Xujin Chris
Jiang, Lavender Yao
Mankowski, Michal A.
Segev, Dorry L.
Aphinyanaphongs, Yindalon
Riina, Howard A.
Golfinos, John G.
Orringer, Daniel A.
Kondziolka, Douglas
Oermann, Eric Karl
NATURE MEDICINE, 2025, 31 (02) : 618 - 626
[8] Defending Large Language Models Against Jailbreaking Attacks Through Goal Prioritization
Zhang, Zhexin
Yang, Junxiao
Ke, Pei
Mi, Fei
Wang, Hongning
Huang, Minlie
PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 8865 - 8887
[9] Semantic Shield: Defending Vision-Language Models Against Backdooring and Poisoning via Fine-grained Knowledge Alignment
Ishmam, Alvi Md
Thomas, Christopher
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 24820 - 24830
[10] Fine-Pruning: Defending Against Backdooring Attacks on Deep Neural Networks
Liu, Kang
Dolan-Gavitt, Brendan
Garg, Siddharth
RESEARCH IN ATTACKS, INTRUSIONS, AND DEFENSES, RAID 2018, 2018, 11050 : 273 - 294

← 1 2 3 4 5 →