Improved Techniques for Optimization-Based Jailbreaking on Large Language Models

被引:0
|
作者
Jia, Xiaojun [1 ]
Pang, Tianyu [2 ]
Du, Chao [2 ]
Huang, Yihao [1 ]
Gu, Jindong [3 ]
Liu, Yang [1 ]
Cao, Xiaochun [4 ]
Lin, Min [2 ]
机构
[1] Nanyang Technological University, Singapore
[2] Sea AI Lab, Singapore
[3] University of Oxford, Oxford, United Kingdom
[4] School of Cyber Science and Technology, Sun Yat-Sen University, Shenzhen Campus, China
来源
关键词
Compilation and indexing terms; Copyright 2024 Elsevier Inc;
D O I
暂无
中图分类号
学科分类号
摘要
引用
收藏
相关论文
共 50 条
  • [1] EasyJailbreak: A Unified Framework for Jailbreaking Large Language Models
    Zhou, Weikang
    Wang, Xiao
    Xiong, Limao
    Xia, Han
    Gu, Yingshuang
    Chai, Mingxu
    Zhu, Fukang
    Huang, Caishuang
    Dou, Shihan
    Xi, Zhiheng
    Zheng, Rui
    Gao, Songyang
    Zou, Yicheng
    Yan, Hang
    Le, Yifan
    Wang, Ruohui
    Li, Lijun
    Shao, Jing
    Gui, Tao
    Zhang, Qi
    Huang, Xuanjing
    arXiv,
  • [2] Jailbreaking Black Box Large Language Models in Twenty Queries
    Chao, Patrick
    Robey, Alexander
    Dobriban, Edgar
    Hassani, Hamed
    Pappas, George J.
    Wong, Eric
    arXiv, 2023,
  • [3] Unleashing the Unseen: Harnessing Benign Datasets for Jailbreaking Large Language Models
    Zhao, Wei
    Li, Zhe
    Li, Yige
    Sun, Jun
    arXiv,
  • [4] The Dark Side of Function Calling: Pathways to Jailbreaking Large Language Models
    School of Computer Science and Technology, Xidian University, China
    arXiv,
  • [5] Cognitive Overload: Jailbreaking Large Language Models with Overloaded Logical Thinking
    Xu, Nan
    Wang, Fei
    Zhou, Ben
    Li, Bangzheng
    Xiao, Chaowei
    Chen, Muhao
    Findings of the Association for Computational Linguistics: NAACL 2024 - Findings, 2024, : 3526 - 3548
  • [6] JailbreakZoo: Survey, Landscapes, and Horizons in Jailbreaking Large Language and Vision-Language Models
    Jin, Haibo
    Hu, Leyang
    Li, Xinnuo
    Zhang, Peiyan
    Chen, Chonghan
    Zhuang, Jun
    Wang, Haohan
    arXiv,
  • [7] Open Sesame! Universal Black-Box Jailbreaking of Large Language Models
    Lapid, Raz
    Langberg, Ron
    Sipper, Moshe
    APPLIED SCIENCES-BASEL, 2024, 14 (16):
  • [8] Defending Large Language Models Against Jailbreaking Attacks Through Goal Prioritization
    Zhang, Zhexin
    Yang, Junxiao
    Ke, Pei
    Mi, Fei
    Wang, Hongning
    Huang, Minlie
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 8865 - 8887
  • [9] An Improved Weighted Optimization-based Framework for Large-scale MOPs
    Zheng, Junhao
    Li, Lingjie
    Lin, Qiuzhen
    Ming, Zhong
    2021 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC 2021), 2021, : 2156 - 2163
  • [10] A Survey of Testing Techniques Based on Large Language Models
    Qi, Fei
    Hou, Yingnan
    Lin, Ning
    Bao, Shanshan
    Xu, Nuo
    PROCEEDINGS OF 2024 INTERNATIONAL CONFERENCE ON COMPUTER AND MULTIMEDIA TECHNOLOGY, ICCMT 2024, 2024, : 280 - 284