Assessing the Responses of Large Language Models (ChatGPT-4, Gemini, and Microsoft Copilot) to Frequently Asked Questions in Breast Imaging: A Study on Readability and Accuracy

被引:11
|
作者
Tepe, Murat [1 ]
Emekli, Emre [2 ]
机构
[1] Mediclin City Hosp, Radiol, Dubai, U Arab Emirates
[2] Eskisehir Osmangazi Univ, Hlth Practice & Res Hosp, Radiol, Eskisehir, Turkiye
关键词
artificial intelligence; breast imaging; microsoft copilot; gemini; chatgpt; large language models;
D O I
10.7759/cureus.59960
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
Background Large language models (LLMs), such as ChatGPT-4, Gemini, and Microsoft Copilot, have been instrumental in various domains, including healthcare, where they enhance health literacy and aid in patient decisionmaking. Given the complexities involved in breast imaging procedures, accurate and comprehensible information is vital for patient engagement and compliance. This study aims to evaluate the readability and accuracy of the information provided by three prominent LLMs, ChatGPT-4, Gemini, and Microsoft Copilot, in response to frequently asked questions in breast imaging, assessing their potential to improve patient understanding and facilitate healthcare communication. Methodology We collected the most common questions on breast imaging from clinical practice and posed them to LLMs. We then evaluated the responses in terms of readability and accuracy. Responses from LLMs were analyzed for readability using the Flesch Reading Ease and Flesch-Kincaid Grade Level tests and for accuracy through a radiologist -developed Likert-type scale. Results The study found significant variations among LLMs. Gemini and Microsoft Copilot scored higher on readability scales (p < 0.001), indicating their responses were easier to understand. In contrast, ChatGPT-4 demonstrated greater accuracy in its responses (p < 0.001). Conclusions While LLMs such as ChatGPT-4 show promise in providing accurate responses, readability issues may limit their utility in patient education. Conversely, Gemini and Microsoft Copilot, despite being less accurate, are more accessible to a broader patient audience. Ongoing adjustments and evaluations of these models are essential to ensure they meet the diverse needs of patients, emphasizing the need for continuous improvement and oversight in the deployment of artificial intelligence technologies in healthcare.
引用
收藏
页数:9
相关论文
共 29 条
  • [1] Acceptability and readability of ChatGPT-4 based responses for frequently asked questions about strabismus and amblyopia
    Guven, S.
    Ayyildiz, B.
    JOURNAL FRANCAIS D OPHTALMOLOGIE, 2025, 48 (03):
  • [2] Evaluation of Responses to Questions About Keratoconus Using ChatGPT-4.0, Google Gemini and Microsoft Copilot: A Comparative Study of Large Language Models on Keratoconus
    Demir, Suleyman
    EYE & CONTACT LENS-SCIENCE AND CLINICAL PRACTICE, 2025, 51 (03): : e107 - e111
  • [3] Readability, reliability and quality of responses generated by ChatGPT, gemini, and perplexity for the most frequently asked questions about pain
    Ozduran, Erkan
    Akkoc, Ibrahim
    Buyukcoban, Sibel
    Erkin, Yueksel
    Hanci, Volkan
    MEDICINE, 2025, 104 (11)
  • [4] Evaluation of the quality and readability of ChatGPT responses to frequently asked questions about myopia in traditional Chinese language
    Chang, Li-Chun
    Sun, Chi-Chin
    Chen, Ting-Han
    Tsai, Der-Chong
    Lin, Hui-Ling
    Liao, Li-Ling
    DIGITAL HEALTH, 2024, 10
  • [5] Comparative accuracy of ChatGPT-4, Microsoft Copilot and Google Gemini in the Italian entrance test for healthcare sciences degrees: a cross-sectional study
    Rossettini, Giacomo
    Rodeghiero, Lia
    Corradi, Federica
    Cook, Chad
    Pillastrini, Paolo
    Turolla, Andrea
    Castellini, Greta
    Chiappinotto, Stefania
    Gianola, Silvia
    Palese, Alvisa
    BMC MEDICAL EDUCATION, 2024, 24 (01)
  • [6] Decoding medical jargon: The use of AI language models (ChatGPT-4, BARD, microsoft copilot) in radiology reports
    Tepe, Murat
    Emekli, Emre
    PATIENT EDUCATION AND COUNSELING, 2024, 126
  • [7] Assessing the Accuracy of ChatGPT's Responses to Frequently Asked Questions Related to Radiofrequency Ablation for Varicose Veins
    Shaikh, Fareed A.
    Anees, Muhammad
    Shaikh, Hafsah Ali A.
    Siddiqui, Nadeem A.
    Rehman, Zia U.
    JOURNAL OF THE AMERICAN COLLEGE OF SURGEONS, 2024, 239 (05) : S585 - S585
  • [8] Assessment of the Responses of the Artificial Intelligence-based Chatbot ChatGPT-4 to Frequently Asked Questions About Amblyopia and Childhood Myopia
    Nikdel, Mojgan
    Ghadimi, Hadi
    Tavakoli, Mehdi
    Suh, Donny W.
    JOURNAL OF PEDIATRIC OPHTHALMOLOGY & STRABISMUS, 2024, 61 (02) : 86 - 89
  • [9] High accuracy but limited readability of large language model-generated responses to frequently asked questions about Kienböck's disease
    Asfuroglu, Zeynel Mert
    Yagar, Hilal
    Gumusoglu, Ender
    BMC MUSCULOSKELETAL DISORDERS, 2024, 25 (01)
  • [10] Political Bias in Large Language Models: A Comparative Analysis of ChatGPT-4, Perplexity, Google Gemini, and Claude
    Choudhary, Tavishi
    IEEE ACCESS, 2025, 13 : 11341 - 11379