Tree of Attacks: Jailbreaking Black-Box LLMs Automatically

被引:0
|
作者
Mehrotra, Anay [1 ]
Zampetakis, Manolis [2 ]
Kassianik, Paul [3 ]
Nelson, Blaine [3 ]
Anderson, Hyrum [3 ]
Singer, Yaron [3 ]
Karbasi, Amin [4 ]
机构
[1] Yale University, Robust Intelligence, United States
[2] Yale University, United States
[3] Robust Intelligence, United States
[4] Yale University, Google Research, United States
来源
arXiv | 2023年
关键词
Compendex;
D O I
暂无
中图分类号
学科分类号
摘要
Iterative methods
引用
收藏
相关论文
共 50 条
  • [1] Black-Box Reconstruction Attacks on LLMs: A Preliminary Study in Code Summarization
    Russodivito, Marco
    Spina, Angelica
    Scalabrino, Simone
    Oliveto, Rocco
    QUALITY OF INFORMATION AND COMMUNICATIONS TECHNOLOGY, QUATIC 2024, 2024, 2178 : 391 - 398
  • [2] Data Contamination Calibration for Black-box LLMs
    Ye, Wentao
    Hu, Jiaqi
    Li, Liyao
    Wang, Haobo
    Chen, Gang
    Zhao, Junbo
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 10845 - 10861
  • [3] Open Sesame! Universal Black-Box Jailbreaking of Large Language Models
    Lapid, Raz
    Langberg, Ron
    Sipper, Moshe
    APPLIED SCIENCES-BASEL, 2024, 14 (16):
  • [4] Defending LLMs against Jailbreaking Attacks via Backtranslation
    Wang, Yihan
    Shi, Zhouxing
    Bai, Andrew
    Hsieh, Cho-Jui
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 16031 - 16046
  • [5] Simple Black-box Adversarial Attacks
    Guo, Chuan
    Gardner, Jacob R.
    You, Yurong
    Wilson, Andrew Gordon
    Weinberger, Kilian Q.
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [6] AN ALGORITHM FOR AUTOMATICALLY GENERATING BLACK-BOX TEST CASES
    Xu Baowen Nie Changhai Shi Qunfeng Lu Hong (Department of computer Science & Engineering
    JournalofElectronics(China), 2003, (01) : 74 - 77
  • [7] Black-Box Data Poisoning Attacks on Crowdsourcing
    Chen, Pengpeng
    Yang, Yongqiang
    Yang, Dingqi
    Sun, Hailong
    Chen, Zhijun
    Lin, Peng
    PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 2975 - 2983
  • [8] Toward Visual Distortion in Black-Box Attacks
    Li, Nannan
    Chen, Zhenzhong
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 6156 - 6167
  • [9] Resiliency of SNN on Black-Box Adversarial Attacks
    Paudel, Bijay Raj
    Itani, Aashish
    Tragoudas, Spyros
    20TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2021), 2021, : 799 - 806
  • [10] SoK: Pitfalls in Evaluating Black-Box Attacks
    Suya, Fnu
    Suri, Anshuman
    Zhang, Tingwei
    Hong, Jingtao
    Tian, Yuan
    Evans, David
    IEEE CONFERENCE ON SAFE AND TRUSTWORTHY MACHINE LEARNING, SATML 2024, 2024, : 387 - 407