Adaptation with Self-Evaluation to Improve Selective Prediction in LLMs

被引:0
|
作者
Chen, Jiefeng [1 ,3 ,4 ]
Yoon, Jinsung [2 ]
Ebrahimi, Sayna [2 ]
Arik, Sercan O. [2 ]
Pfister, Tomas [2 ]
Jha, Somesh [1 ,2 ]
机构
[1] Univ Wisconsin Madison, Madison, WI 53706 USA
[2] Google LLC, Mountain View, CA USA
[3] Google, Mountain View, CA USA
[4] Amazon, Seattle, WA USA
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large language models (LLMs) have recently shown great advances in a variety of tasks, including natural language understanding and generation. However, their use in high-stakes decision-making scenarios is still limited due to the potential for errors. Selective prediction is a technique that can be used to improve the reliability of the LLMs by allowing them to abstain from making predictions when they are unsure of the answer. In this work, we propose a novel framework for adaptation with self-evaluation to improve the selective prediction performance of LLMs. Our framework is based on the idea of using parameter-efficient tuning to adapt the LLM to the specific task at hand while improving its ability to perform self-evaluation. We evaluate our method on a variety of question-answering (QA) datasets and show that it outperforms state-of-the-art selective prediction methods. For example, on the CoQA benchmark, our method improves the AUACC from 91.23% to 92.63% and improves the AUROC from 74.61% to 80.25%.
引用
收藏
页码:5190 / 5213
页数:24
相关论文
共 50 条
  • [21] Self-evaluation
    Jourdain, M.
    Raynard, B.
    REANIMATION, 2012, 21 (04): : 474 - 476
  • [22] SELF-EVALUATION
    HEINZE, G
    SALUD MENTAL, 1985, 8 (02) : 59 - &
  • [23] Self-evaluation
    Nseir, S.
    Wolff, M.
    REANIMATION, 2012, 21 (03): : 362 - 364
  • [24] Self-evaluation
    Poissy, J.
    Weiss, N.
    REANIMATION, 2013, 22 (06): : 656 - 658
  • [25] Self-evaluation of a clinical pathway to improve the results of rectal cancer
    Sancho, Cristina
    Villalba, Francisco L.
    Jose Garcia-Coret, M.
    Vazquez, Antonio
    Jose Safont, M.
    Hernandez, Ana
    Martinez, Encarnacion
    Martinez-Sanjuan, Vicente
    Garcia-Armengol, Juan
    Roig, Jose V.
    CIRUGIA ESPANOLA, 2010, 87 (04): : 231 - 238
  • [26] Self-Evaluation Improves Selective Generation in Large Language Models
    Ren, Jie
    Zhao, Yao
    Vu, Tu
    Liu, Peter J.
    Lakshminarayanan, Balaji
    PROCEEDINGS ON I CAN'T BELIEVE IT'S NOT BETTER: FAILURE MODES IN THE AGE OF FOUNDATION MODELS AT NEURIPS 2023 WORKSHOPS, 2023, 239 : 49 - 64
  • [27] NON-ASSERTIVENESS - SKILL DEFICIT OR SELECTIVE SELF-EVALUATION
    ALDEN, L
    CAPPE, R
    BEHAVIOR THERAPY, 1981, 12 (01) : 107 - 114
  • [28] Reboxetine: its effects as measured by the social adaptation self-evaluation scale
    Healy, D
    ACTA PSYCHIATRICA SCANDINAVICA, 2000, 101 : 45 - 51
  • [29] Knowing Your Limits - Self-Evaluation and Prediction in Object Recognition
    Zillich, Michael
    Prankl, Johann
    Moerwald, Thomas
    Vincze, Markus
    2011 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, 2011, : 813 - 820
  • [30] A comparison of TAGteach and video self-evaluation to improve dance movement accuracy
    Goben, Eliza J.
    Ferguson, Rachael
    Wilder, David A.
    JOURNAL OF APPLIED BEHAVIOR ANALYSIS, 2023, 56 (04) : 914 - 922