Adaptation with Self-Evaluation to Improve Selective Prediction in LLMs

被引：0

作者：

Chen, Jiefeng ^{[1
,3
,4
]}

Yoon, Jinsung ^{[2
]}

Ebrahimi, Sayna ^{[2
]}

Arik, Sercan O. ^{[2
]}

Pfister, Tomas ^{[2
]}

Jha, Somesh ^{[1
,2
]}

机构：

[1] Univ Wisconsin Madison, Madison, WI 53706 USA

[2] Google LLC, Mountain View, CA USA

[3] Google, Mountain View, CA USA

[4] Amazon, Seattle, WA USA

来源：

FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023 | 2023年

基金：

美国国家科学基金会;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Large language models (LLMs) have recently shown great advances in a variety of tasks, including natural language understanding and generation. However, their use in high-stakes decision-making scenarios is still limited due to the potential for errors. Selective prediction is a technique that can be used to improve the reliability of the LLMs by allowing them to abstain from making predictions when they are unsure of the answer. In this work, we propose a novel framework for adaptation with self-evaluation to improve the selective prediction performance of LLMs. Our framework is based on the idea of using parameter-efficient tuning to adapt the LLM to the specific task at hand while improving its ability to perform self-evaluation. We evaluate our method on a variety of question-answering (QA) datasets and show that it outperforms state-of-the-art selective prediction methods. For example, on the CoQA benchmark, our method improves the AUACC from 91.23% to 92.63% and improves the AUROC from 74.61% to 80.25%.

引用

页码：5190 / 5213

页数：24

共 50 条

[21] Self-evaluation
Jourdain, M.
Raynard, B.
REANIMATION, 2012, 21 (04): : 474 - 476
[22] SELF-EVALUATION
HEINZE, G
SALUD MENTAL, 1985, 8 (02) : 59 - &
[23] Self-evaluation
Nseir, S.
Wolff, M.
REANIMATION, 2012, 21 (03): : 362 - 364
[24] Self-evaluation
Poissy, J.
Weiss, N.
REANIMATION, 2013, 22 (06): : 656 - 658
[25] Self-evaluation of a clinical pathway to improve the results of rectal cancer
Sancho, Cristina
Villalba, Francisco L.
Jose Garcia-Coret, M.
Vazquez, Antonio
Jose Safont, M.
Hernandez, Ana
Martinez, Encarnacion
Martinez-Sanjuan, Vicente
Garcia-Armengol, Juan
Roig, Jose V.
CIRUGIA ESPANOLA, 2010, 87 (04): : 231 - 238
[26] Self-Evaluation Improves Selective Generation in Large Language Models
Ren, Jie
Zhao, Yao
Vu, Tu
Liu, Peter J.
Lakshminarayanan, Balaji
PROCEEDINGS ON I CAN'T BELIEVE IT'S NOT BETTER: FAILURE MODES IN THE AGE OF FOUNDATION MODELS AT NEURIPS 2023 WORKSHOPS, 2023, 239 : 49 - 64
[27] NON-ASSERTIVENESS - SKILL DEFICIT OR SELECTIVE SELF-EVALUATION
ALDEN, L
CAPPE, R
BEHAVIOR THERAPY, 1981, 12 (01) : 107 - 114
[28] Reboxetine: its effects as measured by the social adaptation self-evaluation scale
Healy, D
ACTA PSYCHIATRICA SCANDINAVICA, 2000, 101 : 45 - 51
[29] Knowing Your Limits - Self-Evaluation and Prediction in Object Recognition
Zillich, Michael
Prankl, Johann
Moerwald, Thomas
Vincze, Markus
2011 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, 2011, : 813 - 820
[30] A comparison of TAGteach and video self-evaluation to improve dance movement accuracy
Goben, Eliza J.
Ferguson, Rachael
Wilder, David A.
JOURNAL OF APPLIED BEHAVIOR ANALYSIS, 2023, 56 (04) : 914 - 922

← 1 2 3 4 5 →