Hallucination in AI-generated financial literature reviews: evaluating bibliographic accuracy

被引:0
|
作者
Erdem, Orhan [1 ]
Hassett, Kristi [1 ]
Egriboyun, Feyzullah [2 ]
机构
[1] Univ North Texas, Adv Data Analyt Dept, 1155 Union Circle 310830, Denton, TX 76203 USA
[2] HULT Int Business Sch, Hult House East,35 Commercial Rd, London E1 1LD, England
关键词
Artificial intelligence; Chatbots; ChatGPT; Gemini; Hallucination; Large language model;
D O I
10.1007/s41060-025-00731-0
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We evaluate the reliability of three chatbots (ChatGPT-4o, o1-preview, and Gemini Advanced) in providing references on financial literature and employing novel methodologies. Alongside the conventional binary approach common in the literature, we develop a non-binary method incorporating degree of hallucination, and we also introduce an age index to assess how hallucination rates vary based on the recency of a topic. The study analyzes 150 citations for each chatbot across 15 financial topics. The results reveal significant differences in performance among the chatbots. ChatGPT-4o has a hallucination rate of 20.0%, while the o1-preview has a hallucination rate of 21.3%. In contrast, Gemini Advanced exhibits a significantly higher hallucination rate of 76.7%. While hallucination rates increase for more recent topics, this trend is not statistically significant for Gemini Advanced. These findings emphasize the importance of verifying chatbot-provided references, particularly in rapidly evolving fields.
引用
收藏
页数:10
相关论文
共 50 条
  • [31] Evaluating the Coherence and Diversity in AI-Generated and Paraphrased Scientific Abstracts: A Fuzzy Topic Modeling Approach
    Onan, Aytug
    Celikten, Tugba
    INTELLIGENT AND FUZZY SYSTEMS, INFUS 2024 CONFERENCE, VOL 1, 2024, 1088 : 149 - 157
  • [32] Integrating AI into clinical education: evaluating general practice trainees' proficiency in distinguishing AI-generated hallucinations and impacting factors
    Zhou, Jiacheng
    Zhang, Jintao
    Wan, Rongrong
    Cui, Xiaochuan
    Liu, Qiyu
    Guo, Hua
    Shi, Xiaofen
    Fu, Bingbing
    Meng, Jia
    Yue, Bo
    Zhang, Yunyun
    Zhang, Zhiyong
    BMC MEDICAL EDUCATION, 2025, 25 (01)
  • [33] Physician vs. AI-generated messages in urology: evaluation of accuracy, completeness, and preference by patients and physicians
    Robinson, Eric J.
    Qiu, Chunyuan
    Sands, Stuart
    Khan, Mohammad
    Vora, Shivang
    Oshima, Kenichiro
    Nguyen, Khang
    Difronzo, L. Andrew
    Rhew, David
    Feng, Mark I.
    WORLD JOURNAL OF UROLOGY, 2024, 43 (01)
  • [34] Global-Local Image Perceptual Score (GLIPS): Evaluating Photorealistic Quality of AI-Generated Images
    Aziz, Memoona
    Rehman, Umair
    Danish, Muhammad Umair
    Grolinger, Katarina
    IEEE TRANSACTIONS ON HUMAN-MACHINE SYSTEMS, 2025, 55 (02) : 223 - 233
  • [35] Art or Artifact: Evaluating the Accuracy, Appeal, and Educational Value of AI-Generated Imagery in DALL<middle dot>E 3 for Illustrating Congenital Heart Diseases
    Temsah, Mohamad-Hani
    Alhuzaimi, Abdullah N.
    Almansour, Mohammed
    Aljamaan, Fadi
    Alhasan, Khalid
    Batarfi, Munirah A.
    Altamimi, Ibraheem
    Alharbi, Amani
    Alsuhaibani, Adel Abdulaziz
    Alwakeel, Leena
    Alzahrani, Abdulrahman Abdulkhaliq
    Alsulaim, Khaled B.
    Jamal, Amr
    Khayat, Afnan
    Alghamdi, Mohammed Hussien
    Halwani, Rabih
    Khan, Muhammad Khurram
    Al-Eyadhy, Ayman
    Nazer, Rakan
    JOURNAL OF MEDICAL SYSTEMS, 2024, 48 (01)
  • [36] Evaluating AI-Generated Questions: A Mixed-Methods Analysis Using Question Data and Student Perceptions
    Van Campenhout, Rachel
    Hubertz, Martha
    Johnson, Benny G.
    ARTIFICIAL INTELLIGENCE IN EDUCATION, PT I, 2022, 13355 : 344 - 353
  • [37] Evaluating diagnostic content of AI-generated chest radiography: A multi-center visual Turing test
    Myong, Youho
    Yoon, Dan
    Kim, Byeong Soo
    Kim, Young Gyun
    Sim, Yongsik
    Lee, Suji
    Yoon, Jiyoung
    Cho, Minwoo
    Kim, Sungwan
    PLOS ONE, 2023, 18 (04):
  • [38] Unveiling AI-Generated Financial Text: A Computational Approach Using Natural Language Processing and Generative Artificial Intelligence
    Arshed, Muhammad Asad
    Gherghina, Stefan Cristian
    Dewi, Christine
    Iqbal, Asma
    Mumtaz, Shahzad
    COMPUTATION, 2024, 12 (05)
  • [39] Human-Written vs AI-Generated Texts in Orthopedic Academic Literature: Comparative Qualitative Analysis
    Hakam, Hassan Tarek
    Prill, Robert
    Korte, Lisa
    Lovrekovi, Bruno
    Ostoji, Marko
    Ramadanov, Nikolai
    Muehlensiepen, Felix
    JMIR FORMATIVE RESEARCH, 2024, 8
  • [40] Exploring the boundaries of authorship: a comparative analysis of AI-generated text and human academic writing in English literature
    Amirjalili, Forough
    Neysani, Masoud
    Nikbakht, Ahmadreza
    FRONTIERS IN EDUCATION, 2024, 9