Automated Fact-Checking of Claims from Wikipedia

被引:0
|
作者
Sathe, Aalok [1 ]
Ather, Salar [1 ]
Tuan Manh Le [1 ]
Perry, Nathan [2 ]
Park, Joonsuk [1 ]
机构
[1] Univ Richmond, Dept Math & Comp Sci, Richmond, VA 23173 USA
[2] Williams Coll, Dept Comp Sci, Williamstown, MA 01267 USA
关键词
fact-checking; fact-verification; natural language inference; textual entailment; corpus;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Automated fact checking is becoming increasingly vital as both truthful and fallacious information accumulate online. Research on fact checking has benefited from large-scale datasets such as FEVER and SNLI. However, such datasets suffer from limited applicability due to the synthetic nature of claims and/or evidence written by annotators that differ from real claims and evidence on the internet. To this end, we present WIKIFACTCHECK-ENGLISH, a dataset of 124k+ triples consisting of a claim, context and an evidence document extracted from English Wikipedia articles and citations, as well as 34k+ manually written claims that are refuted by the evidence documents. This is the largest fact checking dataset consisting of real claims and evidence to date; it will allow the development of fact checking systems that can better process claims and evidence in the real world. We also show that for the NLI subtask, a logistic regression system trained using existing and novel features achieves peak accuracy of 68%, providing a competitive baseline for future work. Also, a decomposable attention model trained on SNLI significantly underperforms the models trained on this dataset, suggesting that models trained on manually generated data may not be sufficiently generalizable or suitable for fact checking real-world claims.
引用
收藏
页码:6874 / 6882
页数:9
相关论文
共 50 条
  • [1] WhatTheWikiFact: Fact-Checking Claims Against Wikipedia
    Chernyavskiy, Anton
    Ilvovsky, Dmitry
    Nakov, Preslav
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, : 4690 - 4695
  • [2] Explainable Automated Fact-Checking for Public Health Claims
    Kotonya, Neema
    Toni, Francesca
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 7740 - 7754
  • [3] Automated fact-checking of climate claims with large language models
    Leippold, Markus
    Vaghefi, Saeid Ashraf
    Stammbach, Dominik
    Muccione, Veruska
    Bingler, Julia
    Ni, Jingwei
    Senni, Chiara Colesanti
    Wekhof, Tobias
    Schimanski, Tobias
    Gostlow, Glen
    Yu, Tingyu
    Luterbacher, Juerg
    Huggel, Christian
    NPJ CLIMATE ACTION, 2025, 4 (01):
  • [4] A Survey on Automated Fact-Checking
    Guo, Zhijiang
    Schlichtkrull, Michael
    Vlachos, Andreas
    TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2022, 10 : 178 - 206
  • [5] Automated fact-checking: A survey
    Zeng, Xia
    Abumansour, Amani S.
    Zubiaga, Arkaitz
    LANGUAGE AND LINGUISTICS COMPASS, 2021, 15 (10):
  • [6] AmbiFC : Fact-Checking Ambiguous Claims with Evidence
    Glockner, Max
    Staliunaite, Ieva
    Thorne, James
    Vallejo, Gisela
    Vlachos, Andreas
    Gurevych, Iryna
    TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2024, 12 : 1 - 18
  • [7] Multimodal Automated Fact-Checking: A Survey
    Akhtar, Mubashara
    Schlichtkrull, Michael
    Guo, Zhijiang
    Cocarascu, Oana
    Simperl, Elena
    Vlachos, Andreas
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 5430 - 5448
  • [8] "The Data Says Otherwise" - Towards Automated Fact-checking and Communication of Data Claims
    Fu, Yu
    Guo, Shunan
    Bursztyn, Victor S.
    Hoffswell, Jane
    Rossi, Ryan
    Stasko, John
    PROCEEDINGS OF THE 37TH ANNUAL ACM SYMPOSIUM ON USER INTERFACE SOFTWARE AND TECHNOLOGY, USIT 2024, 2024,
  • [9] A Case of Claims and Facts: Automated Fact-Checking the Future of Journalism's Authority
    Johnson, Patrick R.
    DIGITAL JOURNALISM, 2023,
  • [10] Toward Automated Fact-Checking: Detecting Check-worthy Factual Claims by ClaimBuster
    Hassan, Naeemul
    Arslan, Fatma
    Li, Chengkai
    Tremayne, Mark
    KDD'17: PROCEEDINGS OF THE 23RD ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2017, : 1803 - 1812