Nightshade: Prompt-Specific Poisoning Attacks on Text-to-Image Generative Models

被引:0
|
作者
Shan, Shawn [1 ]
Ding, Wenxin [1 ]
Passananti, Josephine [1 ]
Wu, Stanley [1 ]
Zheng, Haitao [1 ]
Zhao, Ben Y. [1 ]
机构
[1] Univ Chicago, Dept Comp Sci, Chicago, IL 60637 USA
关键词
D O I
10.1109/SP54263.2024.00207
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Trained on billions of images, diffusion-based text-to-image models seem impervious to traditional data poisoning attacks, which typically require poison samples approaching 20% of the training set. In this paper, we show that state-of-the-art text-to-image generative models are in fact highly vulnerable to poisoning attacks. Our work is driven by two key insights. First, while diffusion models are trained on billions of samples, the number of training samples associated with a specific concept or prompt is generally on the order of thousands. This suggests that these models will be vulnerable to prompt-specific poisoning attacks that corrupt a model's ability to respond to specific targeted prompts. Second, poison samples can be carefully crafted to maximize poison potency to ensure success with very few samples. We introduce Nightshade, a prompt-specific poisoning attack optimized for potency that can completely control the output of a prompt in Stable Diffusion's newest model (SDXL) with less than 100 poisoned training samples. Nightshade also generates stealthy poison images that look visually identical to their benign counterparts, and produces poison effects that "bleed through" to related concepts. More importantly, a moderate number of Nightshade attacks on independent prompts can destabilize a model and disable its ability to generate images for any and all prompts. Finally, we propose the use of Nightshade and similar tools as a defense for content owners against web scrapers that ignore opt-out/do-not-crawl directives, and discuss potential implications for both model trainers and content owners.
引用
收藏
页码:807 / 825
页数:19
相关论文
共 50 条
  • [31] GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis
    Tao, Ming
    Bao, Bing-Kun
    Tang, Hao
    Xu, Changsheng
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 14214 - 14223
  • [32] Prompting for products: investigating design space exploration strategies for text-to-image generative models
    Chong, Leah
    Lo, I-Ping
    Rayan, Jude
    Dow, Steven
    Ahmed, Faez
    Lykourentzou, Ioanna
    DESIGN SCIENCE, 2025, 11
  • [33] Stable rivers: A case study in the application of text-to-image generative models for Earth sciences
    Kupferschmidt, C.
    Binns, A. D.
    Kupferschmidt, K. L.
    Taylor, G. W.
    EARTH SURFACE PROCESSES AND LANDFORMS, 2024, 49 (13) : 4213 - 4232
  • [34] Debiasing Text-to-Image Diffusion Models
    He, Ruifei
    Xue, Chuhui
    Tan, Haoru
    Zhang, Wenqing
    Yu, Yingchen
    Bai, Song
    Qi, Xiaojuan
    PROCEEDINGS OF THE 1ST ACM MULTIMEDIA WORKSHOP ON MULTI-MODAL MISINFORMATION GOVERNANCE IN THE ERA OF FOUNDATION MODELS, MIS 2024, 2024, : 29 - 36
  • [35] Holistic Evaluation of Text-to-Image Models
    Lee, Tony
    Yasunaga, Michihiro
    Meng, Chenlin
    Mai, Yifan
    Park, Joon Sung
    Gupta, Agrim
    Zhang, Yunzhi
    Narayanan, Deepak
    Teufel, Hannah Benita
    Bellagente, Marco
    Kang, Minguk
    Park, Taesung
    Leskovec, Jure
    Zhu, Jun-Yan
    Li Fei-Fei
    Wu, Jiajun
    Ermon, Stefano
    Liang, Percy
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [36] Is Your Prompt Detailed Enough? Exploring the Effects of Prompt Coaching on Users' Perceptions, Engagement, and Trust in Text-to-Image Generative AI Tools
    Chen, Cheng
    Lee, Sangwook
    Jang, Eunchae
    Sundar, S. Shyam
    PROCEEDINGS OF THE SECOND INTERNATIONAL SYMPOSIUM ON TRUSTWORTHY AUTONOMOUS SYSTEMS, TAS 2024, 2024,
  • [37] Clever little tricks: A socio-technical history of text-to-image generative models
    Steinfeld, Kyle
    INTERNATIONAL JOURNAL OF ARCHITECTURAL COMPUTING, 2023, 21 (02) : 211 - 241
  • [38] Robustness of Generative Adversarial CLIPs Against Single-Character Adversarial Attacks in Text-to-Image Generation
    Chanakya, Patibandla
    Harsha, Putla
    Pratap Singh, Krishna
    IEEE ACCESS, 2024, 12 : 162551 - 162563
  • [39] Enhancing Arabic Content Generation with Prompt Augmentation Using Integrated GPT and Text-to-Image Models
    Elsharif, Wala
    She, James
    Nakov, Preslav
    Wong, Simon
    PROCEEDINGS OF THE 2023 ACM INTERNATIONAL CONFERENCE ON INTERACTIVE MEDIA EXPERIENCES, IMX 2023, 2023, : 276 - 288
  • [40] Text-to-Image Diffusion Models can be Easily Backdoored through Multimodal Data Poisoning
    Zhai, Shengfang
    Dong, Yinpeng
    Shen, Qingni
    Pu, Shi
    Fang, Yuejian
    Su, Hang
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 1577 - 1587