Evaluating the value of AI-generated questions for USMLE step 1 preparation: A study using ChatGPT-3.5

被引:0
|
作者
Balu, Alan [1 ]
Prvulovic, Stefan T. [1 ]
Perez, Claudia Fernandez [1 ]
Kim, Alexander [1 ]
Donoho, Daniel A. [3 ]
Keating, Gregory [2 ]
机构
[1] Georgetown Univ, Dept Neurosurg, Sch Med, Washington, DC USA
[2] Medstar Georgetown Univ Hosp, Dept Neurosurg, Washington, DC USA
[3] Childrens Natl Hosp, Dept Neurosurg, Washington, DC USA
关键词
ChatGPT; USMLE; Step; 1; LLM; ARTIFICIAL-INTELLIGENCE; FUTURE;
D O I
10.1080/0142159X.2025.2478872
中图分类号
G40 [教育学];
学科分类号
040101 ; 120403 ;
摘要
Purpose: Students are increasingly relying on artificial intelligence (AI) for medical education and exam preparation. However, the factual accuracy and content distribution of AI-generated exam questions for self-assessment have not been systematically investigated. Methods: Curated prompts were created to generate multiple-choice questions matching the USMLE Step 1 examination style. We utilized ChatGPT-3.5 to generate 50 questions and answers based upon each prompt style. We manually examined output for factual accuracy, Bloom's Taxonomy, and category within the USMLE Step 1 content outline. Results: ChatGPT-3.5 generated 150 multiple-choice case-style questions and selected an answer. Overall, 83% of generated multiple questions had no factual inaccuracies and 15% contained one to two factual inaccuracies. With simple prompting, common themes included deep venous thrombosis, myocardial infarction, and thyroid disease. Topic diversity improved by separating content topic generation from question generation, and specificity to Step 1 increased by indicating that "treatment" questions were not desired. Conclusion: We demonstrate that ChatGPT-3.5 can successfully generate Step 1 style questions with reasonable factual accuracy, and this method may be used by medical students preparing for USMLE examinations. While AI-generated questions demonstrated adequate factual accuracy, targeted prompting techniques should be used to overcome ChatGPT's bias towards particular medical conditions.
引用
收藏
页数:9
相关论文
共 4 条
  • [1] Evaluating AI-Generated Questions: A Mixed-Methods Analysis Using Question Data and Student Perceptions
    Van Campenhout, Rachel
    Hubertz, Martha
    Johnson, Benny G.
    ARTIFICIAL INTELLIGENCE IN EDUCATION, PT I, 2022, 13355 : 344 - 353
  • [2] ChatGPT-4 Performance on USMLE Step 1 Style Questions and Its Implications for Medical Education: A Comparative Study Across Systems and Disciplines
    Razmig Garabet
    Brendan P. Mackey
    James Cross
    Michael Weingarten
    Medical Science Educator, 2024, 34 : 145 - 152
  • [3] ChatGPT-4 Performance on USMLE Step 1 Style Questions and Its Implications for Medical Education: A Comparative Study Across Systems and Disciplines
    Garabet, Razmig
    Mackey, Brendan P.
    Cross, James
    Weingarten, Michael
    MEDICAL SCIENCE EDUCATOR, 2024, 34 (01) : 145 - 152
  • [4] Evaluating AI-Generated informed consent documents in oral surgery: A comparative study of ChatGPT-4, Bard gemini advanced, and human-written consents
    Vaira, Luigi Angelo
    Lechien, Jerome R.
    Maniaci, Antonino
    Tanda, Giuseppe
    Abbate, Vincenzo
    Allevi, Fabiana
    Arena, Antonio
    Beltramini, Giada Anna
    Bergonzani, Michela
    Bolzoni, Alessandro Remigio
    Crimi, Salvatore
    Frosolini, Andrea
    Gabriele, Guido
    Maglitto, Fabio
    Mayo-Yanez, Miguel
    Orru, Ludovica
    Petrocelli, Marzia
    Pucci, Resi
    Saibene, Alberto Maria
    Troise, Stefania
    Tel, Alessandro
    Vellone, Valentino
    Chiesa-Estomba, Carlos Miguel
    Boscolo-Rizzo, Paolo
    Salzano, Giovanni
    De Riu, Giacomo
    JOURNAL OF CRANIO-MAXILLOFACIAL SURGERY, 2025, 53 (01) : 18 - 23