Data Stealing Attacks against Large Language Models via Backdooring

被引:1
|
作者
He, Jiaming [1 ]
Hou, Guanyu [1 ]
Jia, Xinyue [1 ]
Chen, Yangyang [1 ]
Liao, Wenqi [1 ]
Zhou, Yinhang [2 ]
Zhou, Rang [1 ]
机构
[1] Chengdu Univ Technol, Coll Comp Sci & Cyber Secur, Oxford Brookes Coll, Chengdu 610059, Peoples R China
[2] Shenyang Normal Univ, Software Coll, Shenyang 110034, Peoples R China
关键词
data privacy; large language models; stealing attacks;
D O I
10.3390/electronics13142858
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Large language models (LLMs) have gained immense attention and are being increasingly applied in various domains. However, this technological leap forward poses serious security and privacy concerns. This paper explores a novel approach to data stealing attacks by introducing an adaptive method to extract private training data from pre-trained LLMs via backdooring. Our method mainly focuses on the scenario of model customization and is conducted in two phases, including backdoor training and backdoor activation, which allow for the extraction of private information without prior knowledge of the model's architecture or training data. During the model customization stage, attackers inject the backdoor into the pre-trained LLM by poisoning a small ratio of the training dataset. During the inference stage, attackers can extract private information from the third-party knowledge database by incorporating the pre-defined backdoor trigger. Our method leverages the customization process of LLMs, injecting a stealthy backdoor that can be triggered after deployment to retrieve private data. We demonstrate the effectiveness of our proposed attack through extensive experiments, achieving a notable attack success rate. Extensive experiments demonstrate the effectiveness of our stealing attack in popular LLM architectures, as well as stealthiness during normal inference.
引用
收藏
页数:19
相关论文
共 50 条
  • [31] HARNESSING TASK OVERLOAD FOR SCALABLE JAILBREAK ATTACKS ON LARGE LANGUAGE MODELS
    Dong, Yiting
    Shen, Guobin
    Zhao, Dongcheng
    He, Xiang
    Zeng, Yi
    arXiv,
  • [32] Adversarial Attacks and Defenses in Large Language Models: Old and New Threats
    Schwinn, Leo
    Dobre, David
    Guennemann, Stephan
    Gidel, Gauthier
    PROCEEDINGS ON I CAN'T BELIEVE IT'S NOT BETTER: FAILURE MODES IN THE AGE OF FOUNDATION MODELS AT NEURIPS 2023 WORKSHOPS, 2023, 239 : 103 - 117
  • [33] Demystifying Data Management for Large Language Models
    Miao, Xupeng
    Jia, Zhihao
    Cui, Bin
    COMPANION OF THE 2024 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, SIGMOD-COMPANION 2024, 2024, : 547 - 555
  • [34] Lookin' Out My Backdoor! Investigating Backdooring Attacks Against DL-driven Malware Detectors
    D'Onghia, Mario
    Di Cesare, Federico
    Gallo, Luigi
    Carminati, Michele
    Polino, Mario
    Zanero, Stefano
    PROCEEDINGS OF THE 16TH ACM WORKSHOP ON ARTIFICIAL INTELLIGENCE AND SECURITY, AISEC 2023, 2023, : 209 - 220
  • [35] Adversarial Attacks Against Deep Generative Models on Data: A Survey
    Sun, Hui
    Zhu, Tianqing
    Zhang, Zhiqiu
    Jin, Dawei
    Xiong, Ping
    Zhou, Wanlei
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (04) : 3367 - 3388
  • [36] Data Poisoning Attacks Against Outcome Interpretations of Predictive Models
    Zhang, Hengtong
    Gao, Jing
    Su, Lu
    KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2021, : 2165 - 2173
  • [37] Large Language Models Are Better Adversaries: Exploring Generative Clean-Label Backdoor Attacks Against Text Classifiers
    You, Wencong
    Hammoudeh, Zayd
    Lowd, Daniel
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 12499 - 12527
  • [38] UnSplit: Data-Oblivious Model Inversion, Model Stealing, and Label Inference Attacks Against Split Learning
    Erdogan, Ege
    Kupcu, Alptekin
    Cicek, A. Ercument
    PROCEEDINGS OF THE 21ST WORKSHOP ON PRIVACY IN THE ELECTRONIC SOCIETY, WPES 2022, 2022, : 115 - 124
  • [39] Stealing Machine Learning Models: Attacks and Countermeasures for Generative Adversarial Networks
    Hu, Hailong
    Pang, Jun
    37TH ANNUAL COMPUTER SECURITY APPLICATIONS CONFERENCE, ACSAC 2021, 2021, : 1 - 16
  • [40] Stealing Machine Learning Parameters via Side Channel Power Attacks
    Wolf, Shaya
    Hu, Hui
    Cooley, Rafer
    Borowczak, Mike
    2021 IEEE COMPUTER SOCIETY ANNUAL SYMPOSIUM ON VLSI (ISVLSI 2021), 2021, : 242 - 247