Data Stealing Attacks against Large Language Models via Backdooring

被引：1

作者：

He, Jiaming ^{[1
]}

Hou, Guanyu ^{[1
]}

Jia, Xinyue ^{[1
]}

Chen, Yangyang ^{[1
]}

Liao, Wenqi ^{[1
]}

Zhou, Yinhang ^{[2
]}

Zhou, Rang ^{[1
]}

机构：

[1] Chengdu Univ Technol, Coll Comp Sci & Cyber Secur, Oxford Brookes Coll, Chengdu 610059, Peoples R China

[2] Shenyang Normal Univ, Software Coll, Shenyang 110034, Peoples R China

来源：

ELECTRONICS | 2024年 / 13卷 / 14期

关键词：

data privacy; large language models; stealing attacks;

D O I：

10.3390/electronics13142858

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Large language models (LLMs) have gained immense attention and are being increasingly applied in various domains. However, this technological leap forward poses serious security and privacy concerns. This paper explores a novel approach to data stealing attacks by introducing an adaptive method to extract private training data from pre-trained LLMs via backdooring. Our method mainly focuses on the scenario of model customization and is conducted in two phases, including backdoor training and backdoor activation, which allow for the extraction of private information without prior knowledge of the model's architecture or training data. During the model customization stage, attackers inject the backdoor into the pre-trained LLM by poisoning a small ratio of the training dataset. During the inference stage, attackers can extract private information from the third-party knowledge database by incorporating the pre-defined backdoor trigger. Our method leverages the customization process of LLMs, injecting a stealthy backdoor that can be triggered after deployment to retrieve private data. We demonstrate the effectiveness of our proposed attack through extensive experiments, achieving a notable attack success rate. Extensive experiments demonstrate the effectiveness of our stealing attack in popular LLM architectures, as well as stealthiness during normal inference.

引用

页数：19

共 50 条

[41] Spectrum Stealing via Sybil Attacks in DSA Networks: Implementation and Defense
Tan, Yi
Hong, Kai
Sengupta, Shamik
Subbalakshmi, K. P.
2011 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), 2011,
[42] Customization of Closed Captions via Large Language Models
Chavez, Mariana Arroyo
Thompson, Bernard
Feanny, Molly
Alabi, Kafayat
Kim, Minchan
Ming, Lu
Glasser, Abraham
Kushalnagar, Raja
Vogler, Christian
COMPUTERS HELPING PEOPLE WITH SPECIAL NEEDS, PT II, ICCHP 2024, 2024, 14751 : 50 - 58
[43] Detoxifying Large Language Models via Knowledge Editing
Wang, Mengru
Zhang, Ningyu
Xu, Ziwen
Xi, Zekun
Deng, Shumin
Yao, Yunzhi
Zhang, Qishen
Yang, Linyi
Wang, Jindong
Chen, Huajun
PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 3093 - 3118
[44] Trend Extraction and Analysis via Large Language Models
Soru, Tommaso
Marshall, Jim
18TH IEEE INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING, ICSC 2024, 2024, : 285 - 288
[45] Attention-based backdoor attacks against natural language processing models
Zhang, Yunchun
Wang, Qi
Min, Shaohui
Zuo, Ruifeng
Huang, Feiyang
Liu, Hao
Yao, Shaowen
APPLIED SOFT COMPUTING, 2025, 173
[46] ChatTwin: Toward Automated Digital Twin Generation for Data Center via Large Language Models
Li, Minghao
Wang, Ruihang
Zhou, Xin
Zhu, Zhaomeng
Wen, Yonggang
Tan, Rui
PROCEEDINGS OF THE 10TH ACM INTERNATIONAL CONFERENCE ON SYSTEMS FOR ENERGY-EFFICIENT BUILDINGS, CITIES, AND TRANSPORTATION, BUILDSYS 2023, 2023, : 208 - 211
[47] CommonIT: Commonality-Aware Instruction Tuning for Large Language Models via Data Partitions
Rao, Jun
Liu, Xuebo
Lian, Lian
Cheng, Shengjun
Liao, Yunjie
Zhang, Min
EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference, 2024, : 10064 - 10083
[48] Controlling the Extraction of Memorized Data from Large Language Models via Prompt-Tuning
Ozdayi, Mustafa Safa
Peris, Charith
Fitzgerald, Jack
Dupuy, Christophe
Majmudar, Jimit
Khan, Haidar
Parikh, Rahil
Gupta, Rahul
61ST CONFERENCE OF THE THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 2, 2023, : 1512 - 1521
[49] Health-LLM: Large Language Models for Health Prediction via Wearable Sensor Data
Kim, Yubin
Xu, Xuhai
McDuff, Daniel
Breazeal, Cynthia
Park, Hae Won
CONFERENCE ON HEALTH, INFERENCE, AND LEARNING, 2024, 248 : 522 - 539
[50] Adversarial attacks and defenses for large language models (LLMs): methods, frameworks & challenges
Kumar, Pranjal
INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2024, 13 (03)

← 1 2 3 4 5 →