共 50 条
- [2] Defending Large Language Models Against Jailbreak Attacks via Layer-specific Editing EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Findings of EMNLP 2024, 2024, : 5094 - 5109
- [4] Membership Inference Attacks against Language Models via Neighbourhood Comparison FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 11330 - 11343
- [5] Prompt Stealing Attacks Against Text-to-Image Generation Models PROCEEDINGS OF THE 33RD USENIX SECURITY SYMPOSIUM, SECURITY 2024, 2024, : 5823 - 5840
- [6] Adversarial Attacks on Large Language Models KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT IV, KSEM 2024, 2024, 14887 : 85 - 96
- [8] Defending Large Language Models Against Jailbreaking Attacks Through Goal Prioritization PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 8865 - 8887
- [9] Semantic Shield: Defending Vision-Language Models Against Backdooring and Poisoning via Fine-grained Knowledge Alignment 2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 24820 - 24830
- [10] Fine-Pruning: Defending Against Backdooring Attacks on Deep Neural Networks RESEARCH IN ATTACKS, INTRUSIONS, AND DEFENSES, RAID 2018, 2018, 11050 : 273 - 294