Mitigating Exaggerated Safety in Large Language Models

被引:0
|
作者
Ray, Ruchira [1 ]
Bhalani, Ruchi [1 ]
机构
[1] University of Texas at Austin, Department of Computer Science, United States
来源
关键词
Compilation and indexing terms; Copyright 2025 Elsevier Inc;
D O I
暂无
中图分类号
学科分类号
摘要
引用
收藏
相关论文
共 50 条
  • [31] Large Language Models
    Cerf, Vinton G.
    COMMUNICATIONS OF THE ACM, 2023, 66 (08) : 7 - 7
  • [32] Mitigating Privacy Seesaw in Large Language Models: Augmented Privacy Neuron Editing via Activation Patching
    Wu, Xinwei
    Dong, Weilong
    Xu, Shaoyang
    Xiong, Deyi
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 5319 - 5332
  • [33] Towards Understanding and Mitigating Social Biases in Language Models
    Liang, Paul Pu
    Wu, Chiyu
    Morency, Louis-Philippe
    Salakhutdinov, Ruslan
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [34] Potential use of large language models for mitigating students' problematic social media use: ChatGPT as an example
    Liu, Xin-Qiao
    Zhang, Zi-Ru
    WORLD JOURNAL OF PSYCHIATRY, 2024, 14 (03):
  • [35] KNOWLEDGE UNLEARNING FOR MITIGATING PRIVACY RISKS IN LANGUAGE MODELS
    Jang, Joel
    Yoon, Dongkeun
    Yang, Sohee
    Cha, Sungmin
    Lee, Moontae
    Logeswaran, Lajanugen
    Seo, Minjoon
    arXiv, 2022,
  • [36] Knowledge Unlearning for Mitigating Privacy Risks in Language Models
    Jang, Joel
    Yoon, Dongkeun
    Yang, Sohee
    Cha, Sungmin
    Lee, Moontae
    Logeswaran, Lajanugen
    Seo, Minjoon
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 14389 - 14408
  • [37] A survey of safety and trustworthiness of large language models through the lens of verification and validation
    Huang, Xiaowei
    Ruan, Wenjie
    Huang, Wei
    Jin, Gaojie
    Dong, Yi
    Wu, Changshun
    Bensalem, Saddek
    Mu, Ronghui
    Qi, Yi
    Zhao, Xingyu
    Cai, Kaiwen
    Zhang, Yanghao
    Wu, Sihao
    Xu, Peipei
    Wu, Dengyu
    Freitas, Andre
    Mustafa, Mustafa A.
    ARTIFICIAL INTELLIGENCE REVIEW, 2024, 57 (07)
  • [38] MM-SafetyBench: A Benchmark for Safety Evaluation of Multimodal Large Language Models
    Liu, Xin
    Zhu, Yichen
    Gu, Jindong
    Lan, Yunshi
    Yang, Chao
    Qiao, Yu
    COMPUTER VISION - ECCV 2024, PT LVI, 2025, 15114 : 386 - 403
  • [39] Shaping the Safety Boundaries: Understanding and Defending Against Jailbreaks in Large Language Models
    Gao, Lang
    Zhang, Xiangliang
    Nakov, Preslav
    Chen, Xiuying
    arXiv,
  • [40] Identifying Exaggerated Language
    Kong, Li
    Li, Chuanyi
    Ge, Jidong
    Luo, Bin
    Ng, Vincent
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 7024 - 7034