Scientific evidence and specific context: leveraging large language models for health fact-checking

被引:0
|
作者
Ni, Zhenni [1 ,2 ]
Qian, Yuxing [3 ]
Chen, Shuaipu [1 ]
Jaulent, Marie-Christine [2 ]
Bousquet, Cedric [2 ,4 ]
机构
[1] Wuhan Univ, Sch Informat Management, Wuhan, Peoples R China
[2] Natl Inst Hlth & Med Res, Lab Med Informat & Knowledge Engn E Hlth LIMICS, INSERM, Paris, France
[3] Nanjing Univ, Sch Journalism & Commun, Nanjing, Peoples R China
[4] Univ Hosp St Etienne, Unit Publ Hlth, St Etienne, France
基金
中国博士后科学基金; 中国国家自然科学基金;
关键词
Large language models; Dual process theory; Health misinformation; Fact-checking; Fake news detection;
D O I
10.1108/OIR-02-2024-0111
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
PurposeThis study aims to evaluate the performance of LLMs with various prompt engineering strategies in the context of health fact-checking.Design/methodology/approachInspired by Dual Process Theory, we introduce two kinds of prompts: Conclusion-first (System 1) and Explanation-first (System 2), and their respective retrieval-augmented variations. We evaluate the performance of these prompts across accuracy, argument elements, common errors and cost-effectiveness. Our study, conducted on two public health fact-checking datasets, categorized 10,212 claims as knowledge, anecdotes and news. To further analyze the reasoning process of LLM, we delve into the argument elements of health fact-checking generated by different prompts, revealing their tendencies in using evidence and contextual qualifiers. We conducted content analysis to identify and compare the common errors across various prompts.FindingsResults indicate that the Conclusion-first prompt performs well in knowledge (89.70%,66.09%), anecdote (79.49%,79.99%) and news (85.61%,85.95%) claims even without retrieval augmentation, proving to be cost-effective. In contrast, the Explanation-first prompt often classifies claims as unknown. However, it significantly boosts accuracy for news claims (87.53%,88.60%) and anecdote claims (87.28%,90.62%) with retrieval augmentation. The Explanation-first prompt is more focused on context specificity and user intent understanding during health fact-checking, showing high potential with retrieval augmentation. Additionally, retrieval-augmented LLMs concentrate more on evidence and context, highlighting the importance of the relevance and safety of retrieved content.Originality/valueThis study offers insights into how a balanced integration could enhance the overall performance of LLMs in critical applications, paving the way for future research on optimizing LLMs for complex cognitive tasks.Peer reviewThe peer review history for this article is available at: https://publons.com/publon/10.1108/OIR-02-2024-0111
引用
收藏
页码:1488 / 1514
页数:27
相关论文
共 50 条
  • [1] Leveraging Large Language Models for Fact-Checking Farsi News Headlines
    Dehghani, Shirin
    Zahedivafa, Mohammadmasiha
    Baghshahi, Zahra
    Zare, Darya
    Yari, Sara
    Samei, Zeynab
    Aliahmadi, Mohammadhadi
    Abbasi, Mahdis
    Mirzamojtahedi, Sara
    Ebrahimi, Sarvenaz
    Alizadeh, Meysam
    DISINFORMATION IN OPEN ONLINE MEDIA, MISDOOM 2024, 2024, 15175 : 16 - 31
  • [2] The perils and promises of fact-checking with large language models
    Quelle, Dorian
    Bovet, Alexandre
    FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2024, 7
  • [3] Automated fact-checking of climate claims with large language models
    Leippold, Markus
    Vaghefi, Saeid Ashraf
    Stammbach, Dominik
    Muccione, Veruska
    Bingler, Julia
    Ni, Jingwei
    Senni, Chiara Colesanti
    Wekhof, Tobias
    Schimanski, Tobias
    Gostlow, Glen
    Yu, Tingyu
    Luterbacher, Juerg
    Huggel, Christian
    NPJ CLIMATE ACTION, 2025, 4 (01):
  • [4] Factuality challenges in the era of large language models and opportunities for fact-checking
    Augenstein, Isabelle
    Baldwin, Timothy
    Cha, Meeyoung
    Chakraborty, Tanmoy
    Ciampaglia, Giovanni Luca
    Corney, David
    Diresta, Renee
    Ferrara, Emilio
    Hale, Scott
    Halevy, Alon
    Hovy, Eduard
    Ji, Heng
    Menczer, Filippo
    Miguez, Ruben
    Nakov, Preslav
    Scheufele, Dietram
    Sharma, Shivam
    Zagni, Giovanni
    NATURE MACHINE INTELLIGENCE, 2024, 6 (08) : 852 - 863
  • [5] Fact-checking information from large language models can decrease headline discernment
    Deverna, Matthew R.
    Yan, Harry Yaojun
    Yang, Kai-Cheng
    Menczer, Filippo
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2024, 121 (50)
  • [6] Fact-Checking the Output of Large Language Models via Token-Level Uncertainty Quantification
    Fadeeva, Ekaterina
    Rubashevskii, Aleksandr
    Shelmanov, Artem
    Petrakov, Sergey
    Li, Haonan
    Mubarak, Hamdy
    Tsymbalov, Evgenii
    Kuzmin, Gleb
    Panchenko, Alexander
    Baldwin, Timothy
    Nakov, Preslav
    Panov, Maxim
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 9367 - 9385
  • [7] A Mystery for You: A fact-checking game enhanced by large language models (LLMs) and a tangible interface
    Tang, Haoheng
    Singha, Mrinalini
    EXTENDED ABSTRACTS OF THE 2024 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, CHI 2024, 2024,
  • [8] About Fact-Checking of Online Scientific Publications
    N. G. Inshakova
    I. A. Pankeev
    Scientific and Technical Information Processing, 2022, 49 : 269 - 274
  • [9] About Fact-Checking of Online Scientific Publications
    Inshakova, N. G.
    Pankeev, I. A.
    SCIENTIFIC AND TECHNICAL INFORMATION PROCESSING, 2022, 49 (04) : 269 - 274
  • [10] Scientific Fact-Checking: A Survey of Resources and Approaches
    Vladika, Juraj
    Matthes, Florian
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 6215 - 6230