Integrating AI into clinical education: evaluating general practice trainees' proficiency in distinguishing AI-generated hallucinations and impacting factors

被引:0
|
作者
Zhou, Jiacheng [1 ,2 ]
Zhang, Jintao [1 ,2 ]
Wan, Rongrong [1 ,2 ]
Cui, Xiaochuan [1 ,2 ]
Liu, Qiyu [1 ,2 ]
Guo, Hua [1 ,2 ]
Shi, Xiaofen [1 ,2 ]
Fu, Bingbing [3 ]
Meng, Jia [4 ]
Yue, Bo [5 ]
Zhang, Yunyun [1 ,2 ,3 ,6 ]
Zhang, Zhiyong [1 ,2 ,3 ,6 ]
机构
[1] Nanjing Med Univ, Affiliated Wuxi Peoples Hosp, Dept Gen Practice, Wuxi, Jiangsu, Peoples R China
[2] Nanjing Med Univ, Wuxi Peoples Hosp, Wuxi Med Ctr, Wuxi, Jiangsu, Peoples R China
[3] Jiamusi Univ, Affiliated Hosp 1, Dept Postgrad Educ, Heilongjiang, Peoples R China
[4] Harbin Med Univ, Affiliated Hosp 2, Dept Gen Practice, Heilongjiang, Peoples R China
[5] Qiqihar Med Univ, Affiliated Hosp 2, Residency Training Ctr, Heilongjiang, Peoples R China
[6] Nanjing Med Univ, Wuxi Peoples Hosp, Affiliated Wuxi Peoples Hosp, Wuxi Med Ctr,Educ Dept, Qingyang Rd 299, Wuxi, Peoples R China
关键词
ChatGPT-4o generated hallucinations; General practice (GP) trainees; General practice specialist training; Response bias; ARTIFICIAL-INTELLIGENCE; CHATGPT;
D O I
10.1186/s12909-025-06916-2
中图分类号
G40 [教育学];
学科分类号
040101 ; 120403 ;
摘要
ObjectiveTo assess the ability of General Practice (GP) Trainees to detect AI-generated hallucinations in simulated clinical practice, ChatGPT-4o was utilized. The hallucinations were categorized into three types based on the accuracy of the answers and explanations: (1) correct answers with incorrect or flawed explanations, (2) incorrect answers with explanations that contradict factual evidence, and (3) incorrect answers with correct explanations.MethodsThis multi-center, cross-sectional survey study involved 142 GP Trainees, all of whom were undergoing General Practice Specialist Training and volunteered to participate. The study evaluated the accuracy and consistency of ChatGPT-4o, as well as the Trainees' response time, accuracy, sensitivity (d'), and response tendencies (beta). Binary regression analysis was used to explore factors affecting the Trainees' ability to identify errors generated by ChatGPT-4o.ResultsA total of 137 participants were included, with a mean age of 25.93 years. Half of the participants were unfamiliar with AI, and 35.0% had never used it. ChatGPT-4o's overall accuracy was 80.8%, which slightly decreased to 80.1% after human verification. However, the accuracy for professional practice (Subject 4) was only 57.0%, and after human verification, it dropped further to 44.2%. A total of 87 AI-generated hallucinations were identified, primarily occurring at the application and evaluation levels. The mean accuracy of detecting these hallucinations was 55.0%, and the mean sensitivity (d') was 0.39. Regression analysis revealed that shorter response times (OR = 0.92, P = 0.02), higher self-assessed AI understanding (OR = 0.16, P = 0.04), and more frequent AI use (OR = 10.43, P = 0.01) were associated with stricter error detection criteria.ConclusionsThe study concluded that GP trainees faced challenges in identifying ChatGPT-4o's errors, particularly in clinical scenarios. This highlights the importance of improving AI literacy and critical thinking skills to ensure effective integration of AI into medical education.
引用
收藏
页数:9
相关论文
共 9 条
  • [1] Evaluating the Efficacy of Hybrid Deep Learning Models in Distinguishing AI-Generated Text
    Oketunji, Abiodun Finbarrs
    arXiv, 2023,
  • [2] A Human-factors Approach for Evaluating AI-generated Images
    Combs, Kara
    Bihl, Trevor J.
    Gadre, Arya
    Christopherson, Isaiah
    PROCEEDINGS OF THE 2024 COMPUTERS AND PEOPLE RESEARCH CONFERENCE, SIGMIS-CPR 2024, 2024,
  • [3] Graduate Education in China Meets AI: Key Factors for Adopting AI-Generated Content Tools
    Tang, Yunjie
    Su, Li
    LIBRI-INTERNATIONAL JOURNAL OF LIBRARIES AND INFORMATION STUDIES, 2024,
  • [4] Teachers' and students' perceptions of AI-generated concept explanations: Implications for integrating generative AI in computer science education
    Lee, Soohwan
    Song, Ki-Sang
    Computers and Education: Artificial Intelligence, 2024, 7
  • [5] AI-augmented clinical decision in paediatric appendicitis: can an AI-generated model improve trainees’ diagnostic capability?
    Anas Shikha
    Asem Kasem
    Win Sabai Phyu Han
    Janice Hui Ling Wong
    European Journal of Pediatrics, 2024, 183 : 1361 - 1366
  • [6] AI-augmented clinical decision in paediatric appendicitis: can an AI-generated model improve trainees' diagnostic capability?
    Shikha, Anas
    Kasem, Asem
    Han, Win Sabai Phyu
    Wong, Janice Hui Ling
    EUROPEAN JOURNAL OF PEDIATRICS, 2024, 183 (03) : 1361 - 1366
  • [7] Integrating Urban Mining Concepts Through AI-Generated Storytelling and Visuals: Advancing Sustainability Education in Early Childhood
    Lu, Ruei-Shan
    Lin, Hao-Chiang Koong
    Yang, Yong-Cih
    Chen, Yo-Ping
    SUSTAINABILITY, 2024, 16 (24)
  • [8] Evaluating AI-generated patient education materials for spinal surgeries: Comparative analysis of readability and DISCERN quality across ChatGPT and deepseek models
    Zhou, Mi
    Pan, Yun
    Zhang, Yuye
    Song, Xiaomei
    Zhou, Youbin
    INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2025, 198
  • [9] Emphasizing probabilistic reasoning education: Helping nephrology trainees to cope with uncertainty in the era of AI-assisted clinical practice
    Chao, Chia-Ter
    Hung, Kuan-Yu
    NEPHROLOGY, 2024, 29 (03) : 169 - 171