Comparing the performance of ChatGPT GPT-4, Bard, and Llama-2 in the Taiwan Psychiatric Licensing Examination and in differential diagnosis with multi-center psychiatrists

被引:8
|
作者
Li, Dian-Jeng [1 ,2 ]
Kao, Yu-Chen [3 ,4 ]
Tsai, Shih-Jen [5 ,6 ]
Bai, Ya-Mei [5 ,6 ,7 ]
Yeh, Ta-Chuan [3 ]
Chu, Che-Sheng [8 ,9 ,10 ,11 ]
Hsu, Chih-Wei [12 ]
Cheng, Szu-Wei [13 ,14 ,15 ]
Hsu, Tien-Wei [16 ,17 ]
Liang, Chih-Sung [3 ,4 ]
Su, Kuan-Pin [14 ,15 ,18 ,19 ]
机构
[1] Kaohsiung Municipal Kai Syuan Psychiat Hosp, Dept Addict Sci, Kaohsiung, Taiwan
[2] Meiho Univ, Dept Nursing, Pingtung, Taiwan
[3] Triserv Gen Hosp, Natl Def Med Ctr, Dept Psychiat, Taipei, Taiwan
[4] Triserv Gen Hosp, Dept Psychiat, Beitou Branch, Taipei, Taiwan
[5] Taipei Vet Gen Hosp, Dept Psychiat, Taipei, Taiwan
[6] Natl Yang Ming Chiao Tung Univ, Coll Med, Dept Psychiat, Taipei, Taiwan
[7] Natl Yang Ming Chiao Tung Univ, Inst Brain Sci, Taipei, Taiwan
[8] Kaohsiung Vet Gen Hosp, Ctr Geriatr & Gerontol, Kaohsiung, Taiwan
[9] Noninvas Neuromodulat Consortium Mental Disorders, Soc Psychophysiol, Taipei, Taiwan
[10] Kaohsiung Med Univ, Grad Inst Med, Coll Med, Kaohsiung, Taiwan
[11] Kaohsiung Vet Gen Hosp, Dept Psychiat, Kaohsiung, Taiwan
[12] Kaohsiung Chang Gung Mem Hosp, Dept Psychiat, Kaohsiung, Taiwan
[13] Chi Mei Med Ctr, Dept Gen Med, Tainan, Taiwan
[14] China Med Univ Hosp, Mind Body Interface Lab MBI Lab, Taichung, Taiwan
[15] China Med Univ Hosp, Dept Psychiat, Taichung, Taiwan
[16] I Shou Univ, E DA Dachang Hosp, Dept Psychiat, Kaohsiung, Taiwan
[17] I Shou Univ, E DA Hosp, Dept Orthoped, Kaohsiung, Taiwan
[18] China Med Univ, Coll Med, Taichung, Taiwan
[19] China Med Univ, An Nan Hosp, Tainan, Taiwan
关键词
chatbot; ChatGPT; differential diagnosis in psychiatry; psychiatric application; Taiwanese psychiatric licensing examination;
D O I
10.1111/pcn.13656
中图分类号
R74 [神经病学与精神病学];
学科分类号
摘要
Aim: Large language models (LLMs) have been suggested to play a role in medical education and medical practice. However, the potential of their application in the psychiatric domain has not been well-studied. Method: In the first step, we compared the performance of ChatGPT GPT-4, Bard, and Llama-2 in the 2022 Taiwan Psychiatric Licensing Examination conducted in traditional Mandarin. In the second step, we compared the scores of these three LLMs with those of 24 experienced psychiatrists in 10 advanced clinical scenario questions designed for psychiatric differential diagnosis. Result: Only GPT-4 passed the 2022 Taiwan Psychiatric Licensing Examination (scoring 69 and >= 60 being considered a passing grade), while Bard scored 36 and Llama-2 scored 25. GPT-4 outperformed Bard and Llama-2, especially in the areas of 'Pathophysiology & Epidemiology' (chi(2) = 22.4, P < 0.001) and 'Psychopharmacology & Other therapies' (chi(2) = 15.8, P < 0.001). In the differential diagnosis, the mean score of the 24 experienced psychiatrists (mean 6.1, standard deviation 1.9) was higher than that of GPT-4 (5), Bard (3), and Llama-2 (1). Conclusion: Compared to Bard and Llama-2, GPT-4 demonstrated superior abilities in identifying psychiatric symptoms and making clinical judgments. Besides, GPT-4's ability for differential diagnosis closely approached that of the experienced psychiatrists. GPT-4 revealed a promising potential as a valuable tool in psychiatric practice among the three LLMs.
引用
收藏
页码:347 / 352
页数:6
相关论文
empty
未找到相关数据