Hallucination in AI-generated financial literature reviews: evaluating bibliographic accuracy

被引:0
|
作者
Erdem, Orhan [1 ]
Hassett, Kristi [1 ]
Egriboyun, Feyzullah [2 ]
机构
[1] Univ North Texas, Adv Data Analyt Dept, 1155 Union Circle 310830, Denton, TX 76203 USA
[2] HULT Int Business Sch, Hult House East,35 Commercial Rd, London E1 1LD, England
关键词
Artificial intelligence; Chatbots; ChatGPT; Gemini; Hallucination; Large language model;
D O I
10.1007/s41060-025-00731-0
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We evaluate the reliability of three chatbots (ChatGPT-4o, o1-preview, and Gemini Advanced) in providing references on financial literature and employing novel methodologies. Alongside the conventional binary approach common in the literature, we develop a non-binary method incorporating degree of hallucination, and we also introduce an age index to assess how hallucination rates vary based on the recency of a topic. The study analyzes 150 citations for each chatbot across 15 financial topics. The results reveal significant differences in performance among the chatbots. ChatGPT-4o has a hallucination rate of 20.0%, while the o1-preview has a hallucination rate of 21.3%. In contrast, Gemini Advanced exhibits a significantly higher hallucination rate of 76.7%. While hallucination rates increase for more recent topics, this trend is not statistically significant for Gemini Advanced. These findings emphasize the importance of verifying chatbot-provided references, particularly in rapidly evolving fields.
引用
收藏
页数:10
相关论文
共 50 条
  • [41] CAN LARGE-LANGUAGE MODELS ACCURATELY DISCERN AI-GENERATED SEXUAL MEDICINE SCIENTIFIC LITERATURE FROM HUMAN GENERATED?
    Singh, D.
    Greenberg, J. W.
    Shkolnik, B.
    Hellstrom, W.
    JOURNAL OF SEXUAL MEDICINE, 2024, 21
  • [42] Comment on "Physician vs. AI-generated messages in urology: evaluation of accuracy, completeness, and preference by patients and physicians"
    Kleebayoon, Amnuay
    Wiwanitkit, Viroj
    WORLD JOURNAL OF UROLOGY, 2025, 43 (01)
  • [43] Evaluating the value of AI-generated questions for USMLE step 1 preparation: A study using ChatGPT-3.5
    Balu, Alan
    Prvulovic, Stefan T.
    Perez, Claudia Fernandez
    Kim, Alexander
    Donoho, Daniel A.
    Keating, Gregory
    MEDICAL TEACHER, 2025,
  • [44] Response to "Letter to the Editor: The Promise and Pitfalls of AI-Generated Anatomical Images: Evaluating Midjourney for Aesthetic Surgery Applications"
    Buzzaccarini, Giovanni
    De Rosa, Laura
    Pagliardini, Luca
    AESTHETIC PLASTIC SURGERY, 2024,
  • [45] Do Users Really Care? Evaluating the User Perception of Disclosing AI-Generated Content on Credibility in (Sports) Journalism
    Rossner, Alexander
    Cassel, Marie
    Huschens, Martin
    PROCEEDINGS OF THE 2024 CONFERENCE ON MENSCH UND COMPUTER, MUC 2024, 2024, : 413 - 418
  • [46] Letter to the Editor: The Promise and Pitfalls of AI-Generated Anatomical Images-Evaluating Midjourney for Aesthetic Surgery Applications
    Ozmen, Berk B.
    Schwarz, Graham S.
    AESTHETIC PLASTIC SURGERY, 2024,
  • [47] AI-generated vs. student-crafted assignments and implications for evaluating student work in nursing: an exploratory reflection
    Metersky, Kateryna
    Chandrasekaran, Kaveenaa
    Rahman, Rezwana
    Haider, Murtaza
    Al-Hamad, Areej
    INTERNATIONAL JOURNAL OF NURSING EDUCATION SCHOLARSHIP, 2024, 21 (01)
  • [48] Teacher- Versus AI-Generated (Poe Application) Corrective Feedback and Language Learners' Writing Anxiety, Complexity, Fluency, and Accuracy
    Wang, Dan
    INTERNATIONAL REVIEW OF RESEARCH IN OPEN AND DISTRIBUTED LEARNING, 2024, 25 (03): : 37 - 56
  • [49] Evaluating AI-generated patient education materials for spinal surgeries: Comparative analysis of readability and DISCERN quality across ChatGPT and deepseek models
    Zhou, Mi
    Pan, Yun
    Zhang, Yuye
    Song, Xiaomei
    Zhou, Youbin
    INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2025, 198
  • [50] Evaluating AI-Generated informed consent documents in oral surgery: A comparative study of ChatGPT-4, Bard gemini advanced, and human-written consents
    Vaira, Luigi Angelo
    Lechien, Jerome R.
    Maniaci, Antonino
    Tanda, Giuseppe
    Abbate, Vincenzo
    Allevi, Fabiana
    Arena, Antonio
    Beltramini, Giada Anna
    Bergonzani, Michela
    Bolzoni, Alessandro Remigio
    Crimi, Salvatore
    Frosolini, Andrea
    Gabriele, Guido
    Maglitto, Fabio
    Mayo-Yanez, Miguel
    Orru, Ludovica
    Petrocelli, Marzia
    Pucci, Resi
    Saibene, Alberto Maria
    Troise, Stefania
    Tel, Alessandro
    Vellone, Valentino
    Chiesa-Estomba, Carlos Miguel
    Boscolo-Rizzo, Paolo
    Salzano, Giovanni
    De Riu, Giacomo
    JOURNAL OF CRANIO-MAXILLOFACIAL SURGERY, 2025, 53 (01) : 18 - 23