Adaptation with Self-Evaluation to Improve Selective Prediction in LLMs

被引:0
|
作者
Chen, Jiefeng [1 ,3 ,4 ]
Yoon, Jinsung [2 ]
Ebrahimi, Sayna [2 ]
Arik, Sercan O. [2 ]
Pfister, Tomas [2 ]
Jha, Somesh [1 ,2 ]
机构
[1] Univ Wisconsin Madison, Madison, WI 53706 USA
[2] Google LLC, Mountain View, CA USA
[3] Google, Mountain View, CA USA
[4] Amazon, Seattle, WA USA
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large language models (LLMs) have recently shown great advances in a variety of tasks, including natural language understanding and generation. However, their use in high-stakes decision-making scenarios is still limited due to the potential for errors. Selective prediction is a technique that can be used to improve the reliability of the LLMs by allowing them to abstain from making predictions when they are unsure of the answer. In this work, we propose a novel framework for adaptation with self-evaluation to improve the selective prediction performance of LLMs. Our framework is based on the idea of using parameter-efficient tuning to adapt the LLM to the specific task at hand while improving its ability to perform self-evaluation. We evaluate our method on a variety of question-answering (QA) datasets and show that it outperforms state-of-the-art selective prediction methods. For example, on the CoQA benchmark, our method improves the AUACC from 91.23% to 92.63% and improves the AUROC from 74.61% to 80.25%.
引用
收藏
页码:5190 / 5213
页数:24
相关论文
共 50 条
  • [41] SELF-EVALUATION PROGRAMS
    不详
    JOURNAL OF THE IOWA MEDICAL SOCIETY, 1974, 64 (10): : 442 - 442
  • [42] VALIDITY IN SELF-EVALUATION
    AMATORA, M
    EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT, 1956, 16 (01) : 119 - 126
  • [43] The dynamics of self-evaluation
    Vallacher, RR
    Nowak, A
    Froehlich, M
    Rockloff, M
    PERSONALITY AND SOCIAL PSYCHOLOGY REVIEW, 2002, 6 (04) : 370 - 379
  • [44] THE DYNAMICS OF SELF-EVALUATION
    BOMZE, IM
    GUTJAHR, W
    APPLIED MATHEMATICS AND COMPUTATION, 1994, 64 (01) : 47 - 63
  • [45] PRACTITIONERS AND SELF-EVALUATION
    ELDRIDGE, WD
    SOCIAL CASEWORK-JOURNAL OF CONTEMPORARY SOCIAL WORK, 1983, 64 (07): : 426 - 430
  • [46] Linear and nonlinear relationships between self-evaluation and self-evaluation bias with grades
    Paschke, Patrick
    Weidinger, Anne Franziska
    Steinmayr, Ricarda
    LEARNING AND INDIVIDUAL DIFFERENCES, 2023, 102
  • [47] Digital platform for self-evaluation and monitoring of municipal selective collection, Brazil
    Besen, Gina Rizpah
    Ribeiro, Helena
    Fracalanza, Ana Paula
    Jacobi, Pedro Roberto
    Gunther, Wanda Maria Risso
    REVISTA TECNOLOGIA E SOCIEDADE, 2021, 17 (47): : 121 - 140
  • [48] Reliability and Validity of the Turkish Version of the Social Adaptation Self-Evaluation Scale (SASS)
    Akkaya, Cengiz
    Sarandol, Asli
    Danaci, Aysen Esen
    Sivrioglu, E. Yusuf
    Kaya, Ender
    Kirli, Selcuk
    TURK PSIKIYATRI DERGISI, 2008, 19 (03) : 292 - 299
  • [49] Development and validation of a social functioning scale, the social adaptation self-evaluation scale
    Bosc, M
    Dubini, A
    Polin, V
    EUROPEAN NEUROPSYCHOPHARMACOLOGY, 1997, 7 : S57 - S70
  • [50] LEADERS' PERCEPTIONS ABOUT SELF-EVALUATION PROCESS AND SELF-EVALUATION FORM IN BAHRAINI SCHOOLS
    Albaker, K.
    EDULEARN12: 4TH INTERNATIONAL CONFERENCE ON EDUCATION AND NEW LEARNING TECHNOLOGIES, 2012, : 4158 - 4158