Evaluation of Large Language Model Performance and Reliability for Citations and References in Scholarly Writing: Cross-Disciplinary Study

被引:8
|
作者
Mugaanyi, Joseph [1 ]
Cai, Liuying [2 ]
Cheng, Sumei [2 ]
Lu, Caide [1 ]
Huang, Jing [1 ]
机构
[1] Ningbo Univ, Lihuili Hosp, Hlth Sci Ctr, Ningbo Med Ctr,Dept Hepatopancreato Biliary Surg, 1111 Jiangnan Rd, Ningbo 315000, Peoples R China
[2] Shanghai Acad Social Sci, Inst Philosophy, Shanghai, Peoples R China
关键词
large language models; accuracy; academic writing; AI; cross -disciplinary evaluation; scholarly writing; ChatGPT; GPT-3.5; writing tool; scholarly; academic discourse; LLMs; machine learning algorithms; NLP; natural language processing; citations; references; natural science; humanities; chatbot; artificial intelligence;
D O I
10.2196/52935
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Background: Large language models (LLMs) have gained prominence since the release of ChatGPT in late 2022. Objective: The aim of this study was to assess the accuracy of citations and references generated by ChatGPT (GPT-3.5) in two distinct academic domains: the natural sciences and humanities. Methods: Two researchers independently prompted ChatGPT to write an introduction section for a manuscript and include citations; they then evaluated the accuracy of the citations and Digital Object Identifiers (DOIs). Results were compared between the two disciplines. Results: Ten topics were included, including 5 in the natural sciences and 5 in the humanities. A total of 102 citations were generated, with 55 in the natural sciences and 47 in the humanities. Among these, 40 citations (72.7%) in the natural sciences and 36 citations (76.6%) in the humanities were confirmed to exist (P=.42). There were significant disparities found in DOI presence in the natural sciences (39/55, 70.9%) and the humanities (18/47, 38.3%), along with significant differences in accuracy between the two disciplines (18/55, 32.7% vs 4/47, 8.5%). DOI hallucination was more prevalent in the humanities (42/55, 89.4%). The Levenshtein distance was significantly higher in the humanities than in the natural sciences, reflecting the lower DOI accuracy. Conclusions: ChatGPT's performance in generating citations and references varies across disciplines. Differences in DOI standards and disciplinary nuances contribute to performance variations. Researchers should consider the strengths and limitations of artificial intelligence writing tools with respect to citation accuracy. The use of domain-specific models may enhance accuracy.
引用
收藏
页数:7
相关论文
共 50 条
  • [1] Citations and the Nature of Cited Sources: A Cross-Disciplinary and Cross-Linguistic Study
    Wang, Guihua
    Hu, Guangwei
    SAGE OPEN, 2022, 12 (02):
  • [2] User Engagement with Scholarly Twitter Mentions: A Large-scale and Cross-disciplinary Analysis
    Fang, Zhichao
    18TH INTERNATIONAL CONFERENCE ON SCIENTOMETRICS & INFORMETRICS (ISSI2021), 2021, : 387 - 392
  • [3] Performance of a Large Language Model in Screening Citations
    Oami, Takehiko
    Okada, Yohei
    Nakada, Taka-aki
    JAMA NETWORK OPEN, 2024, 7 (07) : e2420496
  • [4] User engagement with scholarly tweets of scientific papers: a large-scale and cross-disciplinary analysis
    Zhichao Fang
    Rodrigo Costas
    Paul Wouters
    Scientometrics, 2022, 127 : 4523 - 4546
  • [5] User engagement with scholarly tweets of scientific papers: a large-scale and cross-disciplinary analysis
    Fang, Zhichao
    Costas, Rodrigo
    Wouters, Paul
    SCIENTOMETRICS, 2022, 127 (08) : 4523 - 4546
  • [6] A CROSS-DISCIPLINARY STUDY ON THE FORMS AND FUNCTIONS OF CITATIONS IN THE DISCUSSION SECTIONS OF MASTER'S THESES IN TAIWAN
    Baring, June April M.
    Chang, Peichin
    TAIWAN JOURNAL OF TESOL, 2023, 20 (02): : 39 - 67
  • [7] Quaker Prophetic Language in the Seventeenth Century: A Cross-Disciplinary Case Study
    Roads, Judith
    RELIGIONS, 2018, 9 (08):
  • [8] First-year evaluation of a campus-wide, cross-disciplinary scholarly writing development program supported by a center for biomedical research excellence (COBRE)
    Franks, Amy M.
    Teeter, Benjamin S.
    Davis, Payton
    Allred, Mallory
    Landes, Reid D.
    Koturbash, Igor
    Weber, Judith
    PLOS ONE, 2024, 19 (10):
  • [9] TEXT AND CONTEXT - CROSS-DISCIPLINARY PERSPECTIVES ON LANGUAGE STUDY - KRAMSCH,C, MCCONNELLGINET,S
    KOIKE, DA
    MODERN LANGUAGE JOURNAL, 1993, 77 (01): : 95 - 96
  • [10] Learner performance and reliability of a cross-disciplinary geriatrics standardized patient among medical students and house officers
    Williams, B. C.
    Hall, K. E.
    Supiano, M. A.
    Fitzgerald, J. T.
    Halter, J. B.
    JOURNAL OF GENERAL INTERNAL MEDICINE, 2006, 21 : 80 - 80