Data Stealing Attacks against Large Language Models via Backdooring

被引:1
|
作者
He, Jiaming [1 ]
Hou, Guanyu [1 ]
Jia, Xinyue [1 ]
Chen, Yangyang [1 ]
Liao, Wenqi [1 ]
Zhou, Yinhang [2 ]
Zhou, Rang [1 ]
机构
[1] Chengdu Univ Technol, Coll Comp Sci & Cyber Secur, Oxford Brookes Coll, Chengdu 610059, Peoples R China
[2] Shenyang Normal Univ, Software Coll, Shenyang 110034, Peoples R China
关键词
data privacy; large language models; stealing attacks;
D O I
10.3390/electronics13142858
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Large language models (LLMs) have gained immense attention and are being increasingly applied in various domains. However, this technological leap forward poses serious security and privacy concerns. This paper explores a novel approach to data stealing attacks by introducing an adaptive method to extract private training data from pre-trained LLMs via backdooring. Our method mainly focuses on the scenario of model customization and is conducted in two phases, including backdoor training and backdoor activation, which allow for the extraction of private information without prior knowledge of the model's architecture or training data. During the model customization stage, attackers inject the backdoor into the pre-trained LLM by poisoning a small ratio of the training dataset. During the inference stage, attackers can extract private information from the third-party knowledge database by incorporating the pre-defined backdoor trigger. Our method leverages the customization process of LLMs, injecting a stealthy backdoor that can be triggered after deployment to retrieve private data. We demonstrate the effectiveness of our proposed attack through extensive experiments, achieving a notable attack success rate. Extensive experiments demonstrate the effectiveness of our stealing attack in popular LLM architectures, as well as stealthiness during normal inference.
引用
收藏
页数:19
相关论文
共 50 条
  • [1] Defending Large Language Models Against Jailbreak Attacks via Layer-specific Editing
    Zhao, Wei
    Li, Zhe
    Li, Yige
    Zhang, Ye
    Sun, Jun
    arXiv,
  • [2] Defending Large Language Models Against Jailbreak Attacks via Layer-specific Editing
    Zhao, Wei
    Li, Zhe
    Li, Yige
    Zhang, Ye
    Sun, Jun
    EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Findings of EMNLP 2024, 2024, : 5094 - 5109
  • [3] JailbreakLens: Visual Analysis of Jailbreak Attacks Against Large Language Models
    Feng, Yingchaojie
    Chen, Zhizhang
    Kang, Zhining
    Wang, Sijia
    Zhu, Minfeng
    Zhang, Wei
    Chen, Wei
    arXiv,
  • [4] Membership Inference Attacks against Language Models via Neighbourhood Comparison
    Mattern, Justus
    Mireshghallah, Fatemehsadat
    Jin, Zhijing
    Schoelkopf, Bernhard
    Sachan, Mrinmaya
    Berg-Kirkpatrick, Taylor
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 11330 - 11343
  • [5] Prompt Stealing Attacks Against Text-to-Image Generation Models
    Shen, Xinyue
    Qu, Yiting
    Backes, Michael
    Zhang, Yang
    PROCEEDINGS OF THE 33RD USENIX SECURITY SYMPOSIUM, SECURITY 2024, 2024, : 5823 - 5840
  • [6] Adversarial Attacks on Large Language Models
    Zou, Jing
    Zhang, Shungeng
    Qiu, Meikang
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT IV, KSEM 2024, 2024, 14887 : 85 - 96
  • [7] Medical large language models are vulnerable to data-poisoning attacks
    Alber, Daniel Alexander
    Yang, Zihao
    Alyakin, Anton
    Yang, Eunice
    Rai, Sumedha
    Valliani, Aly A.
    Zhang, Jeff
    Rosenbaum, Gabriel R.
    Amend-Thomas, Ashley K.
    Kurland, David B.
    Kremer, Caroline M.
    Eremiev, Alexander
    Negash, Bruck
    Wiggan, Daniel D.
    Nakatsuka, Michelle A.
    Sangwon, Karl L.
    Neifert, Sean N.
    Khan, Hammad A.
    Save, Akshay Vinod
    Palla, Adhith
    Grin, Eric A.
    Hedman, Monika
    Nasir-Moin, Mustafa
    Liu, Xujin Chris
    Jiang, Lavender Yao
    Mankowski, Michal A.
    Segev, Dorry L.
    Aphinyanaphongs, Yindalon
    Riina, Howard A.
    Golfinos, John G.
    Orringer, Daniel A.
    Kondziolka, Douglas
    Oermann, Eric Karl
    NATURE MEDICINE, 2025, 31 (02) : 618 - 626
  • [8] Defending Large Language Models Against Jailbreaking Attacks Through Goal Prioritization
    Zhang, Zhexin
    Yang, Junxiao
    Ke, Pei
    Mi, Fei
    Wang, Hongning
    Huang, Minlie
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 8865 - 8887
  • [9] Semantic Shield: Defending Vision-Language Models Against Backdooring and Poisoning via Fine-grained Knowledge Alignment
    Ishmam, Alvi Md
    Thomas, Christopher
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 24820 - 24830
  • [10] Fine-Pruning: Defending Against Backdooring Attacks on Deep Neural Networks
    Liu, Kang
    Dolan-Gavitt, Brendan
    Garg, Siddharth
    RESEARCH IN ATTACKS, INTRUSIONS, AND DEFENSES, RAID 2018, 2018, 11050 : 273 - 294