Tree of Attacks: Jailbreaking Black-Box LLMs Automatically

被引:0
|
作者
Mehrotra, Anay [1 ]
Zampetakis, Manolis [2 ]
Kassianik, Paul [3 ]
Nelson, Blaine [3 ]
Anderson, Hyrum [3 ]
Singer, Yaron [3 ]
Karbasi, Amin [4 ]
机构
[1] Yale University, Robust Intelligence, United States
[2] Yale University, United States
[3] Robust Intelligence, United States
[4] Yale University, Google Research, United States
来源
arXiv | 2023年
关键词
Compendex;
D O I
暂无
中图分类号
学科分类号
摘要
Iterative methods
引用
收藏
相关论文
共 50 条
  • [31] Multi-Agent Attacks for Black-Box Social Recommendations
    Wang, Shijie
    Fan, Wenqi
    Wei, Xiao-yong
    Mei, Xiaowei
    Lin, Shanru
    Li, Qing
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2025, 43 (01)
  • [32] Spanning attack: reinforce black-box attacks with unlabeled data
    Lu Wang
    Huan Zhang
    Jinfeng Yi
    Cho-Jui Hsieh
    Yuan Jiang
    Machine Learning, 2020, 109 : 2349 - 2368
  • [33] Black-box Detection of Backdoor Attacks with Limited Information and Data
    Dong, Yinpeng
    Yang, Xiao
    Deng, Zhijie
    Pang, Tianyu
    Xiao, Zihao
    Su, Hang
    Zhu, Jun
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 16462 - 16471
  • [34] Black-Box Adversarial Attacks against Audio Forensics Models
    Jiang, Yi
    Ye, Dengpan
    SECURITY AND COMMUNICATION NETWORKS, 2022, 2022
  • [35] AutoAttacker: A reinforcement learning approach for black-box adversarial attacks
    Tsingenopoulos, Ilias
    Preuveneers, Davy
    Joosen, Wouter
    2019 4TH IEEE EUROPEAN SYMPOSIUM ON SECURITY AND PRIVACY WORKSHOPS (EUROS&PW), 2019, : 229 - 237
  • [36] Inspecting Prediction Confidence for Detecting Black-Box Backdoor Attacks
    Wang, Tong
    Yao, Yuan
    Xu, Feng
    Xu, Miao
    An, Shengwei
    Wang, Ting
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 1, 2024, : 274 - 282
  • [37] Query-based Local Black-box Adversarial Attacks
    Shi, Jing
    Zhang, Xiaolin
    Xu, Enhui
    Wang, Yongping
    Zhang, Wenwen
    International Journal of Network Security, 2023, 25 (06) : 1048 - 1058
  • [38] An Adaptive Black-Box Defense Against Trojan Attacks (TROJDEF)
    Liu, Guanxiong
    Khreishah, Abdallah
    Sharadgah, Fatima
    Khalil, Issa
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (04) : 5367 - 5381
  • [39] Spanning attack: reinforce black-box attacks with unlabeled data
    Wang, Lu
    Zhang, Huan
    Yi, Jinfeng
    Hsieh, Cho-Jui
    Jiang, Yuan
    MACHINE LEARNING, 2020, 109 (12) : 2349 - 2368
  • [40] Imitation Attacks and Defenses for Black-box Machine Translation Systems
    Wallace, Eric
    Stern, Mitchell
    Song, Dawn
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 5531 - 5546