Assessing ChatGPT's Mastery of Bloom's Taxonomy Using Psychosomatic Medicine Exam Questions: Mixed-Methods Study

被引:15
|
作者
Herrmann-Werner, Anne [1 ,2 ]
Festl-Wietek, Teresa [1 ]
Holderried, Friederike [1 ,3 ]
Herschbach, Lea [1 ]
Griewatz, Jan [1 ]
Masters, Ken [4 ]
Zipfel, Stephan [2 ]
Mahling, Moritz [1 ,5 ]
机构
[1] Univ Tubingen, Tubingen Inst Med Educ, Fac Med, Elfriede Aulhorn Str 10, D-72076 Tubingen, Germany
[2] Univ Hosp Tubingen, Dept Psychosomat Med & Psychotherapy, Tubingen, Germany
[3] Univ Hosp Tubingen, Univ Dept Anesthesiol & Intens Care Med, Tubingen, Germany
[4] Sultan Qaboos Univ, Coll Med & Hlth Sci, Med Educ & Informat Dept, Muscat, Oman
[5] Univ Hosp Tubingen, Dept Diabetol Endocrinol Nephrol, Sect Nephrol & Hypertens, Tubingen, Germany
关键词
answer; artificial intelligence; assessment; Bloom's taxonomy; ChatGPT; classification; error; exam; examination; generative; GPT-4; Generative Pre-trained Transformer 4; language model; learning outcome; LLM; MCQ; medical education; medical exam; multiple-choice question; natural language processing; NLP; psychosomatic; question; response; taxonomy; EDUCATION;
D O I
10.2196/52113
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Background: Large language models such as GPT-4 (Generative Pre-trained Transformer 4) are being increasingly used in medicine and medical education. However, these models are prone to "hallucinations" (ie, outputs that seem convincing while being factually incorrect). It is currently unknown how these errors by large language models relate to the different cognitive levels defined in Bloom's taxonomy. Objective: This study aims to explore how GPT-4 performs in terms of Bloom's taxonomy using psychosomatic medicine exam questions. Methods: We used a large data set of psychosomatic medicine multiple-choice questions (N=307) with real-world results derived from medical school exams. GPT-4 answered the multiple-choice questions using 2 distinct prompt versions: detailed and short. The answers were analyzed using a quantitative approach and a qualitative approach. Focusing on incorrectly answered questions, we categorized reasoning errors according to the hierarchical framework of Bloom's taxonomy. Results: GPT-4's performance in answering exam questions yielded a high success rate: 93% (284/307) for the detailed prompt and 91% (278/307) for the short prompt. Questions answered correctly by GPT-4 had a statistically significant higher difficulty than questions answered incorrectly (P=.002 for the detailed prompt and P<.001 for the short prompt). Independent of the prompt, GPT-4's lowest exam performance was 78.9% (15/19), thereby always surpassing the "pass" threshold. Our qualitative analysis of incorrect answers, based on Bloom's taxonomy, showed that errors were primarily in the "remember" (29/68) and "understand" (23/68) cognitive levels; specific issues arose in recalling details, understanding conceptual relationships, and adhering to standardized guidelines. Conclusions: GPT-4 demonstrated a remarkable success rate when confronted with psychosomatic medicine multiple-choice exam questions, aligning with previous findings. When evaluated through Bloom's taxonomy, our data revealed that GPT-4 occasionally ignored specific facts (remember), provided illogical reasoning (understand), or failed to apply concepts to a new situation (apply). These errors, which were confidently presented, could be attributed to inherent model biases and the tendency to generate outputs that maximize likelihood.
引用
收藏
页数:13
相关论文
共 50 条
  • [31] Indigenous-Amazonian Traditional Medicine's Usage of the Tobacco Plant: A Transdisciplinary Ethnopsychological Mixed-Methods Case Study
    Berlowitz, Ilana
    Garcia Torres, Ernesto
    Maake, Caroline
    Wolf, Ursula
    Martin-Soelch, Chantal
    PLANTS-BASEL, 2023, 12 (02):
  • [32] ARTIFICIAL INTELLIGENCE IN MEDICINE: A COMPARATIVE STUDY OF CHATGPT'S LEARNING CAPABILITY IN RESOLVING MEDICAL SPECIALIZATION QUESTIONS
    Fuentes-Martin, A.
    Cilleruelo Ramos, A.
    Segura Mendez, B.
    Victoriano Soriano, G., I
    Mora Puentes, D.
    Represa Pastor, T.
    Perez Aragon, M.
    Soro Garcia, J.
    BRITISH JOURNAL OF SURGERY, 2024, 111
  • [33] Children's experiences of intravenous injection using the draw, write, and tell method: A mixed-methods study
    Kim, Jin Sun
    JOURNAL OF PEDIATRIC NURSING-NURSING CARE OF CHILDREN & FAMILIES, 2023, 71 : 14 - 22
  • [34] Understanding quality of life's challenges in sarcoma patients: A mixed-methods study
    Almeida, Ana Maria
    Lima, Ligia
    Martins, Teresa
    EUROPEAN JOURNAL OF ONCOLOGY NURSING, 2024, 70
  • [35] Kazakhstan's young flagship university: A sequential explanatory mixed-methods study
    Hwami, Munyaradzi
    COGENT EDUCATION, 2023, 10 (01):
  • [36] WHAT MOTIVATES PARTICIPATION IN ALZHEIMER'S PREVENTION RESEARCH: A MIXED-METHODS STUDY
    Murillo, Lizbeth Vera
    Meulen, Maria Vander
    Villamor, Monique
    Collie, Angel
    Cline, Sarah
    Nicholson, Jody
    Edwards, Jerri
    INNOVATION IN AGING, 2022, 6 : 767 - 767
  • [37] Structure and Function of Observation Units in Children's Hospitals: A Mixed-Methods Study
    Shanley, Leticia A.
    Hronek, Carla
    Hall, Matthew
    Alpern, Elizabeth R.
    Fieldston, Evan S.
    Hain, Paul D.
    Shah, Samir S.
    Macy, Michelle L.
    ACADEMIC PEDIATRICS, 2015, 15 (05) : 518 - 525
  • [38] PAEDIATRIC ELBOW FRACTURES FROM A CHILD'S VIEWPOINT: A MIXED-METHODS STUDY
    McCutcheon, V.
    Cooper, A.
    Chhina, H.
    Duffy, D.
    JOURNAL OF INVESTIGATIVE MEDICINE, 2018, 66 (01) : 275 - 275
  • [39] Disaster planning approaches in Iran's health system: A mixed-methods study
    Mohajervatan, Ali
    Atighechian, Golrokh
    Khankeh, Hamid Reza
    Raeisi, Ahmad Reza
    Tavakoli, Nahid
    JOURNAL OF EDUCATION AND HEALTH PROMOTION, 2022, 11 (01) : 309
  • [40] The nurse's tasks performed by aids in hospital settings: a mixed-methods study
    Palese, Alvisa
    Ambrosi, Elisa
    Stefani, Francesca
    Zenere, Alessandra
    Saiani, Luisa
    Barbarigo, Fabio
    Berti, Sara
    Bonomi, Maddalena
    Catana, Laura
    Cecchin, Monica
    Cerantola, Nicola
    Ceresola, Marilena
    Collufio, Luana
    Zambon, Anna
    Tatarasanu, Mirela
    Costa, Maria Chiara
    Fellin, Martina
    Ferrari, Paolo
    Galzignato, Stefania
    Giordano, Gretel
    Guglielmi, Giulia
    Lettieri, Livia
    Linardi, Mariagrazia
    Longhini, Jessica
    Mase, Federica
    Navone, Elena
    Opportuni, Ines
    Padovani, Gioia
    Pesavento, Lara
    Postal, Michele
    Taccon, Mattia
    Milan, Pamela
    Lovo, Renata
    Palese, Alvisa
    Stefani, Francesca
    Zenere, Alessandra
    Ambrosi, Elisa
    Saiani, Luisa
    ASSISTENZA INFERMIERISTICA E RICERCA, 2019, 38 (01) : 6 - 14