SneakyPrompt: Jailbreaking Text-to-image Generative Models

被引:1
|
作者
Yang, Yuchen [1 ]
Hui, Bo [1 ]
Yuan, Haolin [1 ]
Gong, Neil [2 ]
Cao, Yinzhi [1 ]
机构
[1] Johns Hopkins Univ, Baltimore, MD 21218 USA
[2] Duke Univ, Durham, NC USA
基金
美国国家科学基金会;
关键词
D O I
10.1109/SP54263.2024.00123
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Text-to-image generative models such as Stable Diffusion and DALL center dot E raise many ethical concerns due to the generation of harmful images such as Not-Safe-for-Work (NSFW) ones. To address these ethical concerns, safety filters are often adopted to prevent the generation of NSFW images. In this work, we propose SneakyPrompt, the first automated attack framework, to jailbreak text-to-image generative models such that they generate NSFW images even if safety filters are adopted. Given a prompt that is blocked by a safety filter, SneakyPrompt repeatedly queries the text-to-image generative model and strategically perturbs tokens in the prompt based on the query results to bypass the safety filter. Specifically, SneakyPrompt utilizes reinforcement learning to guide the perturbation of tokens. Our evaluation shows that SneakyPrompt successfully jailbreaks DALL center dot E 2 with closed-box safety filters to generate NSFW images. Moreover, we also deploy several state-of-the-art, open-source safety filters on a Stable Diffusion model. Our evaluation shows that SneakyPrompt not only successfully generates NSFW images, but also outperforms existing text adversarial attacks when extended to jailbreak text-to-image generative models, in terms of both the number of queries and qualities of the generated NSFW images. SneakyPrompt is open-source and available at this repository: https://github.com/Yuchen413/text2image safety.
引用
收藏
页码:897 / 912
页数:16
相关论文
共 50 条
  • [41] STRUCTURE-AWARE GENERATIVE ADVERSARIAL NETWORK FOR TEXT-TO-IMAGE GENERATION
    Chen, Wenjie
    Ni, Zhangkai
    Wang, Hanli
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 2075 - 2079
  • [42] Weakly Supervised Attention Inference Generative Adversarial Network for Text-to-Image
    Mei, Lingrui
    Ran, Xuming
    Hu, Jin
    2019 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI 2019), 2019, : 1574 - 1578
  • [43] Optimizing Prompts Using In-Context Few-Shot Learning for Text-to-Image Generative Models
    Lee, Seunghun
    Lee, Jihoon
    Bae, Chan Ho
    Choi, Myung-Seok
    Lee, Ryong
    Ahn, Sangtae
    IEEE ACCESS, 2024, 12 : 2660 - 2673
  • [44] IntentTuner: An Interactive Framework for Integrating Human Intentions in Fine-tuning Text-to-Image Generative Models
    Zeng, Xingchen
    Gao, Ziyao
    Ye, Yilin
    Zeng, Wei
    PROCEEDINGS OF THE 2024 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYTEMS, CHI 2024, 2024,
  • [45] A comparative analysis of text-to-image generative AI models in scientific contexts: a case study on nuclear power
    Joynt, Veda
    Cooper, Jacob
    Bhargava, Naman
    Vu, Katie
    Kwon, O. Hwang
    Allen, Todd R.
    Verma, Aditi
    Radaideh, Majdi I.
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [46] Knowledge-Driven Generative Adversarial Network for Text-to-Image Synthesis
    Peng, Jun
    Zhou, Yiyi
    Sun, Xiaoshuai
    Cao, Liujuan
    Wu, Yongjian
    Huang, Feiyue
    Ji, Rongrong
    IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 (4356-4366) : 4356 - 4366
  • [47] DTGAN: Dual Attention Generative Adversarial Networks for Text-to-Image Generation
    Zhang, Zhenxing
    Schomaker, Lambert
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [48] Generative adversarial network based on semantic consistency for text-to-image generation
    Yue Ma
    Li Liu
    Huaxiang Zhang
    Chunjing Wang
    Zekang Wang
    Applied Intelligence, 2023, 53 : 4703 - 4716
  • [49] Exploring Generative Adversarial Networks for Text-to-Image Generation with Evolution Strategies
    Costa, Victor
    Lourenco, Nuno
    Correia, Joao
    Machado, Penousal
    PROCEEDINGS OF THE 2023 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE COMPANION, GECCO 2023 COMPANION, 2023, : 271 - 274
  • [50] Generative Adversarial Networks with Adaptive Semantic Normalization for text-to-image synthesis
    Huang, Siyue
    Chen, Ying
    DIGITAL SIGNAL PROCESSING, 2022, 120