A survey of datasets in medicine for large language models

被引：0

作者：

Zhang, Deshiwei ^{[1
]}

Xue, Xiaojuan ^{[2
]}

Gao, Peng ^{[3
]}

Jin, Zhijuan ^{[4
]}

Hu, Menghan ^{[2
]}

Wu, Yue ^{[5
]}

Ying, Xiayang ^{[6
]}

机构：

[1] Southeast Univ, Sch Civil Engn, Nanjing 210096, Jiangsu, Peoples R China

[2] East China Normal Univ, Shanghai Key Lab Multidimens Informat Proc, 500 Dongchuan Rd, Shanghai 200241, Peoples R China

[3] Tongji Univ, Shanghai Peoples Hosp 10, Sch Med, Dept Ophthalmol, Shanghai 200072, Peoples R China

[4] Shanghai Jiao Tong Univ, Shanghai Childrens Med Ctr, Sch Med, Dept Dev & Behav Pediat, Shanghai 200127, Peoples R China

[5] Shanghai Jiao Tong Univ, Peoples Hosp 9, Sch Med, Dept Ophthalmol, Shanghai 200011, Peoples R China

[6] Shanghai Jiao Tong Univ, Ruijin Hosp, Pancreat Dis Ctr, Sch Med,Dept Gen Surg, 197 Ruijin 2nd Rd, Shanghai 200001, Peoples R China

来源：

INTELLIGENCE & ROBOTICS | 2024年 / 4卷 / 04期

关键词：

Large language models (LLMs); NLP; dataset in medicine; Q&A system in medicine;

D O I：

10.20517/ir.2024.27

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

With the advent of models such as ChatGPT and other models, large language models (LLMs) have demonstrated unprecedented capabilities in understanding and generating natural language, presenting novel opportunities and challenges within the medicine domain. While there have been many studies focusing on the employment of LLMs in medicine, comprehensive reviews of the datasets utilized in this field remain scarce. This survey seeks to address this gap by providing a comprehensive overview of the datasets in medicine fueling LLMs, highlighting their unique characteristics and the critical roles they play at different stages of LLMs' development: pre-training, fine-tuning, and evaluation. Ultimately, this survey aims to underline the significance of datasets in realizing the full potential of LLMs to innovate and improve healthcare outcomes.

引用

页码：457 / 478

页数：22

共 50 条

[1] Large language models for medicine: a survey
Zheng, Yanxin
Gan, Wensheng
Chen, Zefeng
Qi, Zhenlian
Liang, Qian
Yu, Philip S.
INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2025, 16 (02) : 1015 - 1040
[2] A comprehensive survey of large language models and multimodal large models in medicine
Xiao, Hanguang
Zhou, Feizhong
Liu, Xingyue
Liu, Tianqi
Li, Zhipeng
Liu, Xin
Huang, Xiaoxuan
INFORMATION FUSION, 2025, 117
[3] Large language models in medicine
Thirunavukarasu, Arun James
Ting, Darren Shu Jeng
Elangovan, Kabilan
Gutierrez, Laura
Tan, Ting Fang
Ting, Daniel Shu Wei
NATURE MEDICINE, 2023, 29 (08) : 1930 - 1940
[4] Large language models in medicine
Arun James Thirunavukarasu
Darren Shu Jeng Ting
Kabilan Elangovan
Laura Gutierrez
Ting Fang Tan
Daniel Shu Wei Ting
Nature Medicine, 2023, 29 : 1930 - 1940
[5] A Comprehensive Survey of Datasets for Large Language Model Evaluation
Lu, Yuting
Sun, Chao
Yan, Yuchao
Zhu, Hegong
Song, Dongdong
Peng, Qing
Yu, Li
Wang, Xiaozheng
Jiang, Jian
Ye, Xiaolong
2024 5TH INFORMATION COMMUNICATION TECHNOLOGIES CONFERENCE, ICTC 2024, 2024, : 330 - 336
[6] Large language models for science and medicine
Telenti, Amalio
Auli, Michael
Hie, Brian L.
Maher, Cyrus
Saria, Suchi
Ioannidis, John P. A.
EUROPEAN JOURNAL OF CLINICAL INVESTIGATION, 2024, 54 (06)
[7] Large Language Models in Finance: A Survey
Li, Yinheng
Wang, Shaofei
Ding, Han
Chen, Hang
PROCEEDINGS OF THE 4TH ACM INTERNATIONAL CONFERENCE ON AI IN FINANCE, ICAIF 2023, 2023, : 374 - 382
[8] Explainability for Large Language Models: A Survey
Zhao, Haiyan
Chen, Hanjie
Yang, Fan
Liu, Ninghao
Deng, Huiqi
Cai, Hengyi
Wang, Shuaiqiang
Yin, Dawei
Du, Mengnan
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2024, 15 (02)
[9] A survey on multimodal large language models
Yin, Shukang
Fu, Chaoyou
Zhao, Sirui
Li, Ke
Sun, Xing
Xu, Tong
Chen, Enhong
NATIONAL SCIENCE REVIEW, 2024, 11 (12)
[10] Large language models in law: A survey
Lai, Jinqi
Gan, Wensheng
Wu, Jiayang
Qi, Zhenlian
Yu, Philip S.
AI OPEN, 2024, 5 : 181 - 196

← 1 2 3 4 5 →