Exploring Capabilities of Large Language Models such as ChatGPT in Radiation

被引:14
|
作者
Dennstadt, Fabio [1 ]
Hastings, Janna [2 ,3 ]
Putora, Paul Martin [1 ,4 ,5 ]
Vu, Erwin [1 ]
Fischer, Galina F. [1 ]
Suveg, Krisztian [1 ]
Glatzer, Markus [1 ]
Riggenbach, Elena [4 ,5 ]
Ha, Hong-Linh [4 ,5 ]
Cihoric, Nikola [4 ,5 ]
机构
[1] Kantonsspital St Gallen, Dept Radiat Oncol, St Gallen, Switzerland
[2] Univ St Gallen, Sch Med, St Gallen, Switzerland
[3] Univ Zurich, Inst Implementat Sci Hlth Care, Zurich, Switzerland
[4] Bern Univ Hosp, Dept Radiat Oncol, Inselspital, Bern, Switzerland
[5] Univ Bern, Bern, Switzerland
关键词
ONCOLOGY; SYSTEMS;
D O I
10.1016/j.adro.2023.101400
中图分类号
R73 [肿瘤学];
学科分类号
100214 ;
摘要
Purpose: Technological progress of machine learning and natural language processing has led to the development of large language models (LLMs), capable of producing well -formed text responses and providing natural language access to knowledge. Modern conversational LLMs such as ChatGPT have shown remarkable capabilities across a variety of fields, including medicine. These models may assess even highly specialized medical knowledge within specific disciplines, such as radiation therapy. We conducted an exploratory study to examine the capabilities of ChatGPT to answer questions in radiation therapy. Methods and Materials: A set of multiple-choice questions about clinical, physics, and biology general knowledge in radiation oncology as well as a set of open-ended questions were created. These were given as prompts to the LLM ChatGPT, and the answers were collected and analyzed. For the multiple-choice questions, it was checked how many of the answers of the model could be clearly assigned to one of the allowed multiple -choice -answers, and the proportion of correct answers was determined. For the open-ended questions, independent blinded radiation oncologists evaluated the quality of the answers regarding correctness and usefulness on a 5point Likert scale. Furthermore, the evaluators were asked to provide suggestions for improving the quality of the answers. Results: For 70 multiple-choice questions, ChatGPT gave valid answers in 66 cases (94.3%). In 60.61% of the valid answers, the selected answer was correct (50.0% of clinical questions, 78.6% of physics questions, and 58.3% of biology questions). For 25 open-ended questions, 12 answers of ChatGPT were considered as "acceptable," "good," or "very good" regarding both correctness and helpfulness by all 6 participating radiation oncologists. Overall, the answers were considered "very good" in 29.3% and 28%, "good" in 28% and 29.3%, "acceptable" in 19.3% and 19.3%, "bad" in 9.3% and 9.3%, and "very bad" in 14% and 14% regarding correctness/helpfulness. Conclusions: Modern conversational LLMs such as ChatGPT can provide satisfying answers to many relevant questions in radiation therapy. As they still fall short of consistently providing correct information, it is problematic to use them for obtaining medical information. As LLMs will further improve in the future, they are expected to have an increasing impact not only on general society, but also on clinical practice, including radiation oncology. (c) 2023 The Author(s). Published by Elsevier Inc. on behalf of American Society for Radiation Oncology. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
引用
收藏
页数:11
相关论文
共 50 条
  • [21] Exploring the Capabilities of Large Multimodal Models on Dense Text
    Zhang, Shuo
    Yang, Biao
    Li, Zhang
    Ma, Zhiyin
    Liu, Yuliang
    Bai, Xiang
    DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024, PT VI, 2024, 14809 : 281 - 298
  • [22] Large language models: a survey of their development, capabilities, and applications
    Annepaka, Yadagiri
    Pakray, Partha
    KNOWLEDGE AND INFORMATION SYSTEMS, 2025, 67 (03) : 2967 - 3022
  • [23] Large language models (ChatGPT) in medical education: Embrace or abjure?
    Luke, Nathasha
    Taneja, Reshma
    Ban, Kenneth
    Samarasekera, Dujeepa
    Yap, Celestial T.
    ASIA PACIFIC SCHOLAR, 2023, 8 (04): : 50 - 52
  • [24] The Security of Using Large Language Models: A Survey with Emphasis on ChatGPT
    Zhou, Wei
    Zhu, Xiaogang
    Han, Qing-Long
    Li, Lin
    Chen, Xiao
    Wen, Sheng
    Xiang, Yang
    IEEE-CAA JOURNAL OF AUTOMATICA SINICA, 2025, 12 (01) : 1 - 26
  • [25] Question Generation Capabilities of "Small" Large Language Models
    Berger, Joshua
    Koss, Jonathan
    Stamatakis, Markos
    Hoppe, Anett
    Ewerth, Ralph
    Wartenal, Christian
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, PT II, NLDB 2024, 2024, 14763 : 183 - 194
  • [26] Assisting Static Analysis with Large Language Models: A ChatGPT Experiment
    Li, Haonan
    Hao, Yu
    Zhai, Yizhuo
    Qian, Zhiyun
    PROCEEDINGS OF THE 31ST ACM JOINT MEETING EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, ESEC/FSE 2023, 2023, : 2107 - 2111
  • [27] Large language models (LLM) and ChatGPT: a medical student perspective
    Arosh S. Perera Molligoda Arachchige
    European Journal of Nuclear Medicine and Molecular Imaging, 2023, 50 : 2248 - 2249
  • [28] Exploration of the capabilities of large language models in preoperative assessment
    Burdon, Robert
    Braunbeck, Kai
    Kotze, Alwyn
    BRITISH JOURNAL OF ANAESTHESIA, 2024, 133 (02) : 460 - 460
  • [29] Large Language Models Like ChatGPT in ABME: Author Guidelines
    Norris, Carly
    ANNALS OF BIOMEDICAL ENGINEERING, 2023, 51 (06) : 1121 - 1122
  • [30] Opportunities and challenges for ChatGPT and large language models in biomedicine and health
    Tian, Shubo
    Jin, Qiao
    Yeganova, Lana
    Lai, Po-Ting
    Zhu, Qingqing
    Chen, Xiuying
    Yang, Yifan
    Chen, Qingyu
    Kim, Won
    Comeau, Donald C.
    Islamaj, Rezarta
    Kapoor, Aadit
    Gao, Xin
    Lu, Zhiyong
    BRIEFINGS IN BIOINFORMATICS, 2024, 25 (01)