Exploring Capabilities of Large Language Models such as ChatGPT in Radiation

被引:14
|
作者
Dennstadt, Fabio [1 ]
Hastings, Janna [2 ,3 ]
Putora, Paul Martin [1 ,4 ,5 ]
Vu, Erwin [1 ]
Fischer, Galina F. [1 ]
Suveg, Krisztian [1 ]
Glatzer, Markus [1 ]
Riggenbach, Elena [4 ,5 ]
Ha, Hong-Linh [4 ,5 ]
Cihoric, Nikola [4 ,5 ]
机构
[1] Kantonsspital St Gallen, Dept Radiat Oncol, St Gallen, Switzerland
[2] Univ St Gallen, Sch Med, St Gallen, Switzerland
[3] Univ Zurich, Inst Implementat Sci Hlth Care, Zurich, Switzerland
[4] Bern Univ Hosp, Dept Radiat Oncol, Inselspital, Bern, Switzerland
[5] Univ Bern, Bern, Switzerland
关键词
ONCOLOGY; SYSTEMS;
D O I
10.1016/j.adro.2023.101400
中图分类号
R73 [肿瘤学];
学科分类号
100214 ;
摘要
Purpose: Technological progress of machine learning and natural language processing has led to the development of large language models (LLMs), capable of producing well -formed text responses and providing natural language access to knowledge. Modern conversational LLMs such as ChatGPT have shown remarkable capabilities across a variety of fields, including medicine. These models may assess even highly specialized medical knowledge within specific disciplines, such as radiation therapy. We conducted an exploratory study to examine the capabilities of ChatGPT to answer questions in radiation therapy. Methods and Materials: A set of multiple-choice questions about clinical, physics, and biology general knowledge in radiation oncology as well as a set of open-ended questions were created. These were given as prompts to the LLM ChatGPT, and the answers were collected and analyzed. For the multiple-choice questions, it was checked how many of the answers of the model could be clearly assigned to one of the allowed multiple -choice -answers, and the proportion of correct answers was determined. For the open-ended questions, independent blinded radiation oncologists evaluated the quality of the answers regarding correctness and usefulness on a 5point Likert scale. Furthermore, the evaluators were asked to provide suggestions for improving the quality of the answers. Results: For 70 multiple-choice questions, ChatGPT gave valid answers in 66 cases (94.3%). In 60.61% of the valid answers, the selected answer was correct (50.0% of clinical questions, 78.6% of physics questions, and 58.3% of biology questions). For 25 open-ended questions, 12 answers of ChatGPT were considered as "acceptable," "good," or "very good" regarding both correctness and helpfulness by all 6 participating radiation oncologists. Overall, the answers were considered "very good" in 29.3% and 28%, "good" in 28% and 29.3%, "acceptable" in 19.3% and 19.3%, "bad" in 9.3% and 9.3%, and "very bad" in 14% and 14% regarding correctness/helpfulness. Conclusions: Modern conversational LLMs such as ChatGPT can provide satisfying answers to many relevant questions in radiation therapy. As they still fall short of consistently providing correct information, it is problematic to use them for obtaining medical information. As LLMs will further improve in the future, they are expected to have an increasing impact not only on general society, but also on clinical practice, including radiation oncology. (c) 2023 The Author(s). Published by Elsevier Inc. on behalf of American Society for Radiation Oncology. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
引用
收藏
页数:11
相关论文
共 50 条
  • [41] ChatGPT on ECT Can Large Language Models Support Psychoeducation?
    Lundin, Robert M.
    Berk, Michael
    Ostergaard, Soren Dinesen
    JOURNAL OF ECT, 2023, 39 (03) : 130 - 133
  • [42] ChatGPT and Gemini large language models for pharmacometrics with NONMEM: comment
    Daungsupawong, Hinpetch
    Wiwanitkit, Viroj
    JOURNAL OF PHARMACOKINETICS AND PHARMACODYNAMICS, 2024, 51 (04) : 303 - 304
  • [43] Can ChatGPT Truly Overcome Other Large Language Models?
    Ray, Partha
    CANADIAN ASSOCIATION OF RADIOLOGISTS JOURNAL-JOURNAL DE L ASSOCIATION CANADIENNE DES RADIOLOGISTES, 2024, 75 (02): : 429 - 429
  • [44] Exploring infection clinicians' perceptions of bias in Large Language Models (LLMs) litre ChatGPT: A deep learning study
    Praveen, S. V.
    Vijaya, S.
    JOURNAL OF INFECTION, 2023, 87 (06) : 579 - 580
  • [45] Exploring Large Language Models for Classical Philology
    Riemenschneider, Frederick
    Frank, Anette
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 15181 - 15199
  • [46] Exploring Mathematical Conjecturing with Large Language Models
    Johansson, Moa
    Smallbone, Nicholas
    NEURAL-SYMBOLIC LEARNING AND REASONING 2023, NESY 2023, 2023,
  • [47] Exploring Length Generalization in Large Language Models
    Anil, Cem
    Wu, Yuhuai
    Andreassen, Anders
    Lewkowycz, Aitor
    Misra, Vedant
    Ramasesh, Vinay
    Slone, Ambrose
    Gur-Ari, Guy
    Dyer, Ethan
    Neyshabur, Behnam
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [48] Can ChatGPT Detect Intent? Evaluating Large Language Models for Spoken Language Understanding
    He, Mutian
    Garner, Philip N.
    INTERSPEECH 2023, 2023, : 1109 - 1113
  • [49] Language-Specific Neurons: The Key to Multilingual Capabilities in Large Language Models
    Tang, Tianyi
    Luo, Wenyang
    Huang, Haoyang
    Zhang, Dongdong
    Wang, Xiaolei
    Zhao, Wayne Xin
    Wei, Furu
    Wen, Ji-Rong
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 5701 - 5715
  • [50] Unleashing the AI revolution: exploring the capabilities and challenges of large language models and text-to-image AI programs
    Youssef, A.
    ULTRASOUND IN OBSTETRICS & GYNECOLOGY, 2023, 62 (02) : 308 - 312