Separate the Wheat from the Chaff: A Post-Hoc Approach to Safety Re-Alignment for Fine-Tuned Language Models

被引:0
|
作者
Wu, Di [1 ]
Lu, Xin [1 ]
Zhao, Yanyan [1 ]
Qin, Bing [1 ]
机构
[1] Research Center for Social Computing and Information Retrieval, Harbin Institute of Technology, China
来源
关键词
Compilation and indexing terms; Copyright 2025 Elsevier Inc;
D O I
暂无
中图分类号
学科分类号
摘要
Benchmarking - Problem oriented languages
引用
收藏
相关论文
共 11 条
  • [1] Distilling Semantic Concept Embeddings from Contrastively Fine-Tuned Language Models
    Li, Na
    Kteich, Hanane
    Bouraoui, Zied
    Schockaert, Steven
    PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023, 2023, : 216 - 226
  • [2] How to detect propaganda from social media? Exploitation of semantic and fine-tuned language models
    Malik, Muhammad Shahid Iqbal
    Imran, Tahir
    Mamdouh, Jamjoom Mona
    PEERJ COMPUTER SCIENCE, 2023, 9
  • [3] How to detect propaganda from social media? Exploitation of semantic and fine-tuned language models
    Malik M.S.I.
    Imran T.
    Mamdouh J.M.
    PeerJ Computer Science, 2023, 9
  • [4] Discourse Structure Extraction from Pre-Trained and Fine-Tuned Language Models in Dialogues
    Li, Chuyuan
    Huber, Patrick
    Xiao, Wen
    Amblard, Maxime
    Braud, Chloe
    Carenini, Giuseppe
    17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 2562 - 2579
  • [5] Mining Insights from Large-Scale Corpora Using Fine-Tuned Language Models
    Palakodety, Shriphani
    KhudaBukhsh, Ashiqur R.
    Carbonell, Jaime G.
    ECAI 2020: 24TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, 325 : 1890 - 1897
  • [6] Prompt Engineering Approach Study for Supervised Fine-Tuned (SFT) Large Language Models (LLMs) in Spacecraft Fault Diagnosis
    Xia, Qing
    Zhao, Haotian
    Liu, Ming
    2024 3RD CONFERENCE ON FULLY ACTUATED SYSTEM THEORY AND APPLICATIONS, FASTA 2024, 2024, : 819 - 824
  • [7] Performance comparison of retrieval-augmented generation and fine-tuned large language models for construction safety management knowledge retrieval
    Lee, Jungwon
    Ahn, Seungjun
    Kim, Daeho
    Kim, Dongkyun
    AUTOMATION IN CONSTRUCTION, 2024, 168
  • [8] OptimalMEE: Optimizing Large Language Models for Medical Event Extraction Through Fine-Tuning and Post-hoc Verification
    Sun, Yaoqian
    Wu, Dan
    Chen, Zikang
    Cai, Hailing
    An, Jiye
    ARTIFICIAL INTELLIGENCE IN MEDICINE, PT I, AIME 2024, 2024, 14844 : 303 - 311
  • [9] Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback
    Tian, Katherine
    Mitchell, Eric
    Zhou, Allan
    Sharma, Archit
    Rafailov, Rafael
    Yao, Huaxiu
    Finn, Chelsea
    Manning, Christopher D.
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 5433 - 5442
  • [10] From large language models to small logic programs: building global explanations from disagreeing local post-hoc explainers
    Agiollo, Andrea
    Siebert, Luciano Cavalcante
    Murukannaiah, Pradeep K.
    Omicini, Andrea
    AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS, 2024, 38 (02)