Nightshade: Prompt-Specific Poisoning Attacks on Text-to-Image Generative Models

被引:0
|
作者
Shan, Shawn [1 ]
Ding, Wenxin [1 ]
Passananti, Josephine [1 ]
Wu, Stanley [1 ]
Zheng, Haitao [1 ]
Zhao, Ben Y. [1 ]
机构
[1] Univ Chicago, Dept Comp Sci, Chicago, IL 60637 USA
关键词
D O I
10.1109/SP54263.2024.00207
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Trained on billions of images, diffusion-based text-to-image models seem impervious to traditional data poisoning attacks, which typically require poison samples approaching 20% of the training set. In this paper, we show that state-of-the-art text-to-image generative models are in fact highly vulnerable to poisoning attacks. Our work is driven by two key insights. First, while diffusion models are trained on billions of samples, the number of training samples associated with a specific concept or prompt is generally on the order of thousands. This suggests that these models will be vulnerable to prompt-specific poisoning attacks that corrupt a model's ability to respond to specific targeted prompts. Second, poison samples can be carefully crafted to maximize poison potency to ensure success with very few samples. We introduce Nightshade, a prompt-specific poisoning attack optimized for potency that can completely control the output of a prompt in Stable Diffusion's newest model (SDXL) with less than 100 poisoned training samples. Nightshade also generates stealthy poison images that look visually identical to their benign counterparts, and produces poison effects that "bleed through" to related concepts. More importantly, a moderate number of Nightshade attacks on independent prompts can destabilize a model and disable its ability to generate images for any and all prompts. Finally, we propose the use of Nightshade and similar tools as a defense for content owners against web scrapers that ignore opt-out/do-not-crawl directives, and discuss potential implications for both model trainers and content owners.
引用
收藏
页码:807 / 825
页数:19
相关论文
共 50 条
  • [1] Design Guidelines for Prompt Engineering Text-to-Image Generative Models
    Liu, Vivian
    Chilton, Lydia B.
    PROCEEDINGS OF THE 2022 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS (CHI' 22), 2022,
  • [2] Prompt Stealing Attacks Against Text-to-Image Generation Models
    Shen, Xinyue
    Qu, Yiting
    Backes, Michael
    Zhang, Yang
    PROCEEDINGS OF THE 33RD USENIX SECURITY SYMPOSIUM, SECURITY 2024, 2024, : 5823 - 5840
  • [3] DIFFUSIONDB: A Large-scale Prompt Gallery Dataset for Text-to-Image Generative Models
    Wang, Zijie J.
    Montoya, Evan
    Munechika, David
    Yang, Haoyang
    Hoover, Benjamin
    Chau, Duen Horng
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 893 - 911
  • [4] Resolving Ambiguities in Text-to-Image Generative Models
    Mehrabi, Ninareh
    Goyal, Palash
    Verma, Apurv
    Dhamala, Jwala
    Kumar, Varun
    Hu, Qian
    Chang, Kai-Wei
    Zemel, Richard
    Galstyan, Aram
    Gupta, Rahul
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 14367 - 14388
  • [5] Typology of Risks of Generative Text-to-Image Models
    Bird, Charlotte
    Ungless, Eddie L.
    Kasirzadeh, Atoosa
    PROCEEDINGS OF THE 2023 AAAI/ACM CONFERENCE ON AI, ETHICS, AND SOCIETY, AIES 2023, 2023, : 396 - 410
  • [6] SneakyPrompt: Jailbreaking Text-to-image Generative Models
    Yang, Yuchen
    Hui, Bo
    Yuan, Haolin
    Gong, Neil
    Cao, Yinzhi
    45TH IEEE SYMPOSIUM ON SECURITY AND PRIVACY, SP 2024, 2024, : 897 - 912
  • [7] Advancements in adversarial generative text-to-image models: a review
    Zaghloul, Rawan
    Rawashdeh, Enas
    Bani-Ata, Tomader
    IMAGING SCIENCE JOURNAL, 2024,
  • [8] Example-Based Conditioning for Text-to-Image Generative Models
    Takada, Atsushi
    Kawabe, Wataru
    Sugano, Yusuke
    IEEE ACCESS, 2024, 12 : 162191 - 162203
  • [9] BAGM: A Backdoor Attack for Manipulating Text-to-Image Generative Models
    Vice, Jordan
    Akhtar, Naveed
    Hartley, Richard
    Mian, Ajmal
    IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2024, 19 : 4865 - 4880
  • [10] Poisoning Attacks via Generative Adversarial Text to Image Synthesis
    Kasichainula, Keshav
    Mansourifar, Hadi
    Shi, Weidong
    51ST ANNUAL IEEE/IFIP INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS (DSN-W 2021), 2021, : 158 - 165