Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback

被引:0
|
作者
Tian, Katherine [1 ]
Mitchell, Eric [2 ]
Zhou, Allan [2 ]
Sharma, Archit [2 ]
Rafailov, Rafael [2 ]
Yao, Huaxiu [2 ]
Finn, Chelsea [2 ]
Manning, Christopher D. [2 ]
机构
[1] Harvard Univ, Cambridge, MA 02138 USA
[2] Stanford Univ, Stanford, CA 94305 USA
关键词
OPPOSITE;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A trustworthy real-world prediction system should produce well-calibrated confidence scores; that is, its confidence in an answer should be indicative of the likelihood that the answer is correct, enabling deferral to an expert in cases of low-confidence predictions. Recent studies have shown that unsupervised pre-training produces large language models (LMs) whose conditional probabilities are remarkably well-calibrated. However, the most widelyused LMs are fine-tuned with reinforcement learning from human feedback (RLHF-LMs), and some studies have suggested that RLHFLMs produce conditional probabilities that are very poorly calibrated. In light of this perceived weakness, we conduct a broad evaluation of methods for extracting confidence scores from RLHF-LMs. For RLHF-LMs such as ChatGPT, GPT-4, and Claude, we find that verbalized confidences emitted as output tokens are typically better-calibrated than the model's conditional probabilities on the TriviaQA, SciQ, and TruthfulQA benchmarks, often reducing the expected calibration error by a relative 50%.
引用
收藏
页码:5433 / 5442
页数:10
相关论文
共 8 条
  • [1] Distilling Semantic Concept Embeddings from Contrastively Fine-Tuned Language Models
    Li, Na
    Kteich, Hanane
    Bouraoui, Zied
    Schockaert, Steven
    PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023, 2023, : 216 - 226
  • [2] How to detect propaganda from social media? Exploitation of semantic and fine-tuned language models
    Malik, Muhammad Shahid Iqbal
    Imran, Tahir
    Mamdouh, Jamjoom Mona
    PEERJ COMPUTER SCIENCE, 2023, 9
  • [3] How to detect propaganda from social media? Exploitation of semantic and fine-tuned language models
    Malik M.S.I.
    Imran T.
    Mamdouh J.M.
    PeerJ Computer Science, 2023, 9
  • [4] Discourse Structure Extraction from Pre-Trained and Fine-Tuned Language Models in Dialogues
    Li, Chuyuan
    Huber, Patrick
    Xiao, Wen
    Amblard, Maxime
    Braud, Chloe
    Carenini, Giuseppe
    17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 2562 - 2579
  • [5] Mining Insights from Large-Scale Corpora Using Fine-Tuned Language Models
    Palakodety, Shriphani
    KhudaBukhsh, Ashiqur R.
    Carbonell, Jaime G.
    ECAI 2020: 24TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, 325 : 1890 - 1897
  • [6] Prodicus at SemEval-2023 Task 4: Enhancing Human Value Detection with Data Augmentation and Fine-Tuned Language Models
    Monazzah, Erfan Moosavi
    Eetemadi, Sauleh
    17TH INTERNATIONAL WORKSHOP ON SEMANTIC EVALUATION, SEMEVAL-2023, 2023, : 2033 - 2038
  • [7] Separate the Wheat from the Chaff: A Post-Hoc Approach to Safety Re-Alignment for Fine-Tuned Language Models
    Wu, Di
    Lu, Xin
    Zhao, Yanyan
    Qin, Bing
    arXiv,
  • [8] Improving entity recognition using ensembles of deep learning and fine-tuned large language models: A case study on adverse event extraction from VAERS and social media
    Li, Yiming
    Viswaroopan, Deepthi
    He, William
    Li, Jianfu
    Zuo, Xu
    Xu, Hua
    Tao, Cui
    JOURNAL OF BIOMEDICAL INFORMATICS, 2025, 163