A survey of datasets in medicine for large language models

被引:0
|
作者
Zhang, Deshiwei [1 ]
Xue, Xiaojuan [2 ]
Gao, Peng [3 ]
Jin, Zhijuan [4 ]
Hu, Menghan [2 ]
Wu, Yue [5 ]
Ying, Xiayang [6 ]
机构
[1] Southeast Univ, Sch Civil Engn, Nanjing 210096, Jiangsu, Peoples R China
[2] East China Normal Univ, Shanghai Key Lab Multidimens Informat Proc, 500 Dongchuan Rd, Shanghai 200241, Peoples R China
[3] Tongji Univ, Shanghai Peoples Hosp 10, Sch Med, Dept Ophthalmol, Shanghai 200072, Peoples R China
[4] Shanghai Jiao Tong Univ, Shanghai Childrens Med Ctr, Sch Med, Dept Dev & Behav Pediat, Shanghai 200127, Peoples R China
[5] Shanghai Jiao Tong Univ, Peoples Hosp 9, Sch Med, Dept Ophthalmol, Shanghai 200011, Peoples R China
[6] Shanghai Jiao Tong Univ, Ruijin Hosp, Pancreat Dis Ctr, Sch Med,Dept Gen Surg, 197 Ruijin 2nd Rd, Shanghai 200001, Peoples R China
来源
INTELLIGENCE & ROBOTICS | 2024年 / 4卷 / 04期
关键词
Large language models (LLMs); NLP; dataset in medicine; Q&A system in medicine;
D O I
10.20517/ir.2024.27
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the advent of models such as ChatGPT and other models, large language models (LLMs) have demonstrated unprecedented capabilities in understanding and generating natural language, presenting novel opportunities and challenges within the medicine domain. While there have been many studies focusing on the employment of LLMs in medicine, comprehensive reviews of the datasets utilized in this field remain scarce. This survey seeks to address this gap by providing a comprehensive overview of the datasets in medicine fueling LLMs, highlighting their unique characteristics and the critical roles they play at different stages of LLMs' development: pre-training, fine-tuning, and evaluation. Ultimately, this survey aims to underline the significance of datasets in realizing the full potential of LLMs to innovate and improve healthcare outcomes.
引用
收藏
页码:457 / 478
页数:22
相关论文
共 50 条
  • [21] Environmental impact of large language models in medicine
    Kleinig, Oliver
    Sinhal, Shreyans
    Khurram, Rushan
    Gao, Christina
    Spajic, Luke
    Zannettino, Andrew
    Schnitzler, Margaret
    Guo, Christina
    Zaman, Sarah
    Smallbone, Harry
    Ittimani, Mana
    Chan, Weng Onn
    Stretton, Brandon
    Godber, Harry
    Chan, Justin
    Turner, Richard C.
    Warren, Leigh R.
    Clarke, Jonathan
    Sivagangabalan, Gopal
    Marshall-Webb, Matthew
    Moseley, Genevieve
    Driscoll, Simon
    Kovoor, Pramesh
    Chow, Clara K.
    Luo, Yuchen
    Thiagalingam, Aravinda
    Zaka, Ammar
    Gould, Paul
    Ramponi, Fabio
    Gupta, Aashray
    Kovoor, Joshua G.
    Bacchi, Stephen
    INTERNAL MEDICINE JOURNAL, 2024, 54 (12) : 2083 - 2086
  • [22] The future landscape of large language models in medicine
    Jan Clusmann
    Fiona R. Kolbinger
    Hannah Sophie Muti
    Zunamys I. Carrero
    Jan-Niklas Eckardt
    Narmin Ghaffari Laleh
    Chiara Maria Lavinia Löffler
    Sophie-Caroline Schwarzkopf
    Michaela Unger
    Gregory P. Veldhuizen
    Sophia J. Wagner
    Jakob Nikolas Kather
    Communications Medicine, 3
  • [23] Privacy issues in Large Language Models: A survey
    Kibriya, Hareem
    Khan, Wazir Zada
    Siddiqa, Ayesha
    Khan, Muhammad Khurrum
    COMPUTERS & ELECTRICAL ENGINEERING, 2024, 120
  • [24] Jailbreak Attack for Large Language Models: A Survey
    Li N.
    Ding Y.
    Jiang H.
    Niu J.
    Yi P.
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2024, 61 (05): : 1156 - 1181
  • [25] Large Language Models for Time Series: A Survey
    Zhang, Xiyuan
    Chowdhury, Ranak Roy
    Gupta, Rajesh K.
    Shang, Jingbo
    PROCEEDINGS OF THE THIRTY-THIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2024, 2024, : 8335 - 8343
  • [26] A Survey on Model Compression for Large Language Models
    Zhu, Xunyu
    Li, Jian
    Liu, Yong
    Ma, Can
    Wang, Weiping
    TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2024, 12 : 1556 - 1577
  • [27] A survey of table reasoning with large language models
    Zhang, Xuanliang
    Wang, Dingzirui
    Dou, Longxu
    Zhu, Qingfu
    Che, Wanxiang
    FRONTIERS OF COMPUTER SCIENCE, 2025, 19 (09)
  • [28] Bias and Fairness in Large Language Models: A Survey
    Gallegos, Isabel O.
    Rossi, Ryan A.
    Barrow, Joe
    Tanjim, Md Mehrab
    Kim, Sungchul
    Dernoncourt, Franck
    Yu, Tong
    Zhang, Ruiyi
    Ahmed, Nesreen K.
    COMPUTATIONAL LINGUISTICS, 2024, 50 (03) : 1097 - 1179
  • [29] Knowledge Editing for Large Language Models: A Survey
    Wang, Song
    Zhu, Yaochen
    Liu, Haochen
    Zheng, Zaiyi
    Chen, Chen
    Li, Jundong
    ACM COMPUTING SURVEYS, 2025, 57 (03)
  • [30] Tool learning with large language models: a survey
    Qu, Changle
    Dai, Sunhao
    Wei, Xiaochi
    Cai, Hengyi
    Wang, Shuaiqiang
    Yin, Dawei
    Xu, Jun
    Wen, Ji-rong
    FRONTIERS OF COMPUTER SCIENCE, 2025, 19 (08)