Hallucination in AI-generated financial literature reviews: evaluating bibliographic accuracy

被引：0

作者：

Erdem, Orhan ^{[1
]}

Hassett, Kristi ^{[1
]}

Egriboyun, Feyzullah ^{[2
]}

机构：

[1] Univ North Texas, Adv Data Analyt Dept, 1155 Union Circle 310830, Denton, TX 76203 USA

[2] HULT Int Business Sch, Hult House East,35 Commercial Rd, London E1 1LD, England

来源：

INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS | 2025年

关键词：

Artificial intelligence; Chatbots; ChatGPT; Gemini; Hallucination; Large language model;

D O I：

10.1007/s41060-025-00731-0

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We evaluate the reliability of three chatbots (ChatGPT-4o, o1-preview, and Gemini Advanced) in providing references on financial literature and employing novel methodologies. Alongside the conventional binary approach common in the literature, we develop a non-binary method incorporating degree of hallucination, and we also introduce an age index to assess how hallucination rates vary based on the recency of a topic. The study analyzes 150 citations for each chatbot across 15 financial topics. The results reveal significant differences in performance among the chatbots. ChatGPT-4o has a hallucination rate of 20.0%, while the o1-preview has a hallucination rate of 21.3%. In contrast, Gemini Advanced exhibits a significantly higher hallucination rate of 76.7%. While hallucination rates increase for more recent topics, this trend is not statistically significant for Gemini Advanced. These findings emphasize the importance of verifying chatbot-provided references, particularly in rapidly evolving fields.

引用

页数：10

共 50 条

[41] CAN LARGE-LANGUAGE MODELS ACCURATELY DISCERN AI-GENERATED SEXUAL MEDICINE SCIENTIFIC LITERATURE FROM HUMAN GENERATED?
Singh, D.
Greenberg, J. W.
Shkolnik, B.
Hellstrom, W.
JOURNAL OF SEXUAL MEDICINE, 2024, 21
[42] Comment on "Physician vs. AI-generated messages in urology: evaluation of accuracy, completeness, and preference by patients and physicians"
Kleebayoon, Amnuay
Wiwanitkit, Viroj
WORLD JOURNAL OF UROLOGY, 2025, 43 (01)
[43] Evaluating the value of AI-generated questions for USMLE step 1 preparation: A study using ChatGPT-3.5
Balu, Alan
Prvulovic, Stefan T.
Perez, Claudia Fernandez
Kim, Alexander
Donoho, Daniel A.
Keating, Gregory
MEDICAL TEACHER, 2025,
[44] Response to "Letter to the Editor: The Promise and Pitfalls of AI-Generated Anatomical Images: Evaluating Midjourney for Aesthetic Surgery Applications"
Buzzaccarini, Giovanni
De Rosa, Laura
Pagliardini, Luca
AESTHETIC PLASTIC SURGERY, 2024,
[45] Do Users Really Care? Evaluating the User Perception of Disclosing AI-Generated Content on Credibility in (Sports) Journalism
Rossner, Alexander
Cassel, Marie
Huschens, Martin
PROCEEDINGS OF THE 2024 CONFERENCE ON MENSCH UND COMPUTER, MUC 2024, 2024, : 413 - 418
[46] Letter to the Editor: The Promise and Pitfalls of AI-Generated Anatomical Images-Evaluating Midjourney for Aesthetic Surgery Applications
Ozmen, Berk B.
Schwarz, Graham S.
AESTHETIC PLASTIC SURGERY, 2024,
[47] AI-generated vs. student-crafted assignments and implications for evaluating student work in nursing: an exploratory reflection
Metersky, Kateryna
Chandrasekaran, Kaveenaa
Rahman, Rezwana
Haider, Murtaza
Al-Hamad, Areej
INTERNATIONAL JOURNAL OF NURSING EDUCATION SCHOLARSHIP, 2024, 21 (01)
[48] Teacher- Versus AI-Generated (Poe Application) Corrective Feedback and Language Learners' Writing Anxiety, Complexity, Fluency, and Accuracy
Wang, Dan
INTERNATIONAL REVIEW OF RESEARCH IN OPEN AND DISTRIBUTED LEARNING, 2024, 25 (03): : 37 - 56
[49] Evaluating AI-generated patient education materials for spinal surgeries: Comparative analysis of readability and DISCERN quality across ChatGPT and deepseek models
Zhou, Mi
Pan, Yun
Zhang, Yuye
Song, Xiaomei
Zhou, Youbin
INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2025, 198
[50] Evaluating AI-Generated informed consent documents in oral surgery: A comparative study of ChatGPT-4, Bard gemini advanced, and human-written consents
Vaira, Luigi Angelo
Lechien, Jerome R.
Maniaci, Antonino
Tanda, Giuseppe
Abbate, Vincenzo
Allevi, Fabiana
Arena, Antonio
Beltramini, Giada Anna
Bergonzani, Michela
Bolzoni, Alessandro Remigio
Crimi, Salvatore
Frosolini, Andrea
Gabriele, Guido
Maglitto, Fabio
Mayo-Yanez, Miguel
Orru, Ludovica
Petrocelli, Marzia
Pucci, Resi
Saibene, Alberto Maria
Troise, Stefania
Tel, Alessandro
Vellone, Valentino
Chiesa-Estomba, Carlos Miguel
Boscolo-Rizzo, Paolo
Salzano, Giovanni
De Riu, Giacomo
JOURNAL OF CRANIO-MAXILLOFACIAL SURGERY, 2025, 53 (01) : 18 - 23

← 1 2 3 4 5 →