Hallucination in AI-generated financial literature reviews: evaluating bibliographic accuracy

被引:0
|
作者
Erdem, Orhan [1 ]
Hassett, Kristi [1 ]
Egriboyun, Feyzullah [2 ]
机构
[1] Univ North Texas, Adv Data Analyt Dept, 1155 Union Circle 310830, Denton, TX 76203 USA
[2] HULT Int Business Sch, Hult House East,35 Commercial Rd, London E1 1LD, England
关键词
Artificial intelligence; Chatbots; ChatGPT; Gemini; Hallucination; Large language model;
D O I
10.1007/s41060-025-00731-0
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We evaluate the reliability of three chatbots (ChatGPT-4o, o1-preview, and Gemini Advanced) in providing references on financial literature and employing novel methodologies. Alongside the conventional binary approach common in the literature, we develop a non-binary method incorporating degree of hallucination, and we also introduce an age index to assess how hallucination rates vary based on the recency of a topic. The study analyzes 150 citations for each chatbot across 15 financial topics. The results reveal significant differences in performance among the chatbots. ChatGPT-4o has a hallucination rate of 20.0%, while the o1-preview has a hallucination rate of 21.3%. In contrast, Gemini Advanced exhibits a significantly higher hallucination rate of 76.7%. While hallucination rates increase for more recent topics, this trend is not statistically significant for Gemini Advanced. These findings emphasize the importance of verifying chatbot-provided references, particularly in rapidly evolving fields.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Appraisal of AI-generated dermatology literature reviews
    Passby, Lauren
    Madhwapathi, Vidya
    Tso, Simon
    Wernham, Aaron
    JOURNAL OF THE EUROPEAN ACADEMY OF DERMATOLOGY AND VENEREOLOGY, 2024, 38 (12) : 2235 - 2239
  • [2] DEVELOPING AND TESTING AI-GENERATED PICOS SUMMARIES TO AID IN LITERATURE REVIEWS
    Rawal, A.
    Ashworth, L.
    Luedke, H.
    Tiwari, S.
    Thomas, C.
    Murton, M.
    VALUE IN HEALTH, 2024, 27 (12)
  • [3] Evaluating Accuracy of AI-Generated Travel Vaccine Recommendations: GPTs in Public Health
    Marin-Rodriguez, J. A.
    Rodriguez, M.
    Leyva, L.
    Torralba, C.
    Agustin, F.
    Enriquez, F.
    EUROPEAN JOURNAL OF PUBLIC HEALTH, 2024, 34
  • [4] EVALUATING THE EFFECTIVENESS OF AI-GENERATED VS REAL ABSTRACTS IN TRAINING MACHINE LEARNING MODELS FOR STUDY SELECTION IN SYSTEMATIC LITERATURE REVIEWS
    Elissa, C.
    Bravo, A.
    Atanasov, P.
    VALUE IN HEALTH, 2024, 27 (12)
  • [5] Personal experience with AI-generated peer reviews: a case study
    Nicholas Lo Vecchio
    Research Integrity and Peer Review, 10 (1)
  • [6] Prompting Bias: Assessing representation and accuracy in AI-generated images
    York, Eric J.
    Brumberger, Eva
    Harris, La Verne Abe
    PROCEEDINGS OF THE 42ND INTERNATIONAL CONFERENCE ON DESIGN OF COMMUNICATION, SIGDOC 2024, 2024, : 106 - 115
  • [7] AI-Generated Clinical Summaries Require More Than Accuracy
    Goodman, Katherine E.
    Yi, Paul H.
    Morgan, Daniel J.
    JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 2024, 331 (08): : 637 - 638
  • [8] A Human-factors Approach for Evaluating AI-generated Images
    Combs, Kara
    Bihl, Trevor J.
    Gadre, Arya
    Christopherson, Isaiah
    PROCEEDINGS OF THE 2024 COMPUTERS AND PEOPLE RESEARCH CONFERENCE, SIGMIS-CPR 2024, 2024,
  • [9] Evaluating Quality and Readability of AI-generated Information on Living Kidney Donation
    Villani, Vincenzo
    Nguyen, Hong-Hanh T.
    Shanmugarajah, Kumaran
    TRANSPLANTATION DIRECT, 2025, 11 (01):
  • [10] Evaluating spam filters and Stylometric Detection of AI-generated phishing emails
    Opara, Chidimma
    Modesti, Paolo
    Golightly, Lewis
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 276