Enhancing chatbot performance for imaging recommendations: Leveraging GPT-4 and context-awareness for trustworthy clinical guidance

被引:0
|
作者
Rau, Alexander [1 ,2 ]
Bamberg, Fabian [1 ]
Fink, Anna [1 ]
Tran, Phuong Hien [1 ]
Reisert, Marco [3 ,4 ]
Russe, Maximilian F. [1 ]
机构
[1] Univ Freiburg, Fac Med, Med Ctr, Dept Diagnost & Intervent Radiol, D-79106 Freiburg, Germany
[2] Univ Freiburg, Fac Med, Med Ctr, Dept Neuroradiol, D-79106 Freiburg, Germany
[3] Univ Freiburg, Fac Med, Med Ctr, Med Phys,Dept Diagnost & Intervent Radiol, D-79106 Freiburg, Germany
[4] Univ Freiburg, Fac Med, Med Ctr, Dept Stereotact & Funct Neurosurg, D-79106 Freiburg, Germany
关键词
Large Language Model; Chatbot; Trust; ACR Guidelines; Clinical Decision Support; Auditability;
D O I
10.1016/j.ejrad.2024.111756
中图分类号
R8 [特种医学]; R445 [影像诊断学];
学科分类号
1002 ; 100207 ; 1009 ;
摘要
Purpose: To investigate if GPT-4 improves the accuracy, consistency, and trustworthiness of a context-aware chatbot to provide personalized imaging recommendations from American College of Radiology (ACR) appropriateness criteria documents using semantic similarity processing: In addition, we sought to enable auditability of the output by revealing the information source the decision relies on. Material and Methods: We refined an existing chatbot that incorporated specialized knowledge of the ACR guidelines by upgrading GPT-3.5-Turbo to its successor GPT-4 by OpenAI, using the latest version of LlamaIndex, and improving the prompting strategy. This chatbot was compared to the previous version, generic GPT-3.5-Turbo and GPT-4, and general radiologists regarding the performance in applying the ACR appropriateness guidelines. Results: The refined context-aware chatbot performed superior to the previous version using GPT-3.5-Turbo, generic chatbots GPT-3.5-Turbo and GPT-4, and general radiologists in providing "usually or may be appropriate" recommendations according to the ACR guidelines (all p < 0.001). It also outperformed GPT-3.5-Turbo and general radiologists in respect to "usually appropriate" recommendations (both p < 0.001). Moreover, the consistency in correct answers was higher with 78 % consistent correct "usually appropriate" answers and 94 % for "usually or may be appropriate" recommendations. In all cases, the same source documents were chosen, ensuring transparency. Conclusion: Our study demonstrates the significance of context awareness in ensuring the use of appropriate knowledge and proposes a strategy to enhance trust in chatbot-based outputs to provide transparency. The improvements in accuracy, consistency, and source transparency address trust issues and enhance the clinical decision support process.
引用
收藏
页数:5
相关论文
共 1 条
  • [1] A content-aware chatbot based on GPT 4 provides trustworthy recommendations for Cone-Beam CT guidelines in dental imaging
    Russe, Maximilian Frederik
    Rau, Alexander
    Ermer, Michael Andreas
    Rothweiler, Rene
    Wenger, Sina
    Kloeble, Klara
    Schulze, Ralf K. W.
    Bamberg, Fabian
    Schmelzeisen, Rainer
    Reisert, Marco
    Semper-Hogg, Wiebke
    DENTOMAXILLOFACIAL RADIOLOGY, 2024, 53 (02) : 109 - 114