Automated Fact-Checking of Claims from Wikipedia

被引:0
|
作者
Sathe, Aalok [1 ]
Ather, Salar [1 ]
Tuan Manh Le [1 ]
Perry, Nathan [2 ]
Park, Joonsuk [1 ]
机构
[1] Univ Richmond, Dept Math & Comp Sci, Richmond, VA 23173 USA
[2] Williams Coll, Dept Comp Sci, Williamstown, MA 01267 USA
关键词
fact-checking; fact-verification; natural language inference; textual entailment; corpus;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Automated fact checking is becoming increasingly vital as both truthful and fallacious information accumulate online. Research on fact checking has benefited from large-scale datasets such as FEVER and SNLI. However, such datasets suffer from limited applicability due to the synthetic nature of claims and/or evidence written by annotators that differ from real claims and evidence on the internet. To this end, we present WIKIFACTCHECK-ENGLISH, a dataset of 124k+ triples consisting of a claim, context and an evidence document extracted from English Wikipedia articles and citations, as well as 34k+ manually written claims that are refuted by the evidence documents. This is the largest fact checking dataset consisting of real claims and evidence to date; it will allow the development of fact checking systems that can better process claims and evidence in the real world. We also show that for the NLI subtask, a logistic regression system trained using existing and novel features achieves peak accuracy of 68%, providing a competitive baseline for future work. Also, a decomposable attention model trained on SNLI significantly underperforms the models trained on this dataset, suggesting that models trained on manually generated data may not be sufficiently generalizable or suitable for fact checking real-world claims.
引用
收藏
页码:6874 / 6882
页数:9
相关论文
共 50 条
  • [21] Fact-checking 101
    Stinson, Linda
    Searcher:Magazine for Database Professionals, 1999, 7 (01):
  • [22] Fact-checking with explanations
    Groza, Adrian
    Katona, Aron
    2022 24TH INTERNATIONAL SYMPOSIUM ON SYMBOLIC AND NUMERIC ALGORITHMS FOR SCIENTIFIC COMPUTING, SYNASC, 2022, : 150 - 157
  • [23] SciClops: Detecting and Contextualizing Scientific Claims for Assisting Manual Fact-Checking
    Smeros, Panayiotis
    Castillo, Carlos
    Aberer, Karl
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, : 1692 - 1702
  • [24] Evidence-based Fact-Checking of Health-related Claims
    Sarrouti, Mourad
    Ben Abacha, Asma
    Mrabet, Yassine
    Demner-Fushman, Dina
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 3499 - 3512
  • [25] Automatic Segmentation and tagging of facts in French for automated fact-checking
    Sarr, Edouard Ngor
    Sall, Ousmane
    Maiga, Aminata
    Faty, Lamine
    Marone, Reine Marie
    2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 5439 - 5441
  • [26] Fact-checking in China: normative and strategic transparency of Chinese journalists in fact-checking reports
    Zhang, Haiyue
    ASIAN JOURNAL OF COMMUNICATION, 2025, 35 (02) : 81 - 99
  • [27] Revisiting the Epistemology of Fact-Checking
    Amazeen, Michelle A.
    CRITICAL REVIEW, 2015, 27 (01) : 1 - 22
  • [28] Fact-Checking a Frozen Mammoth
    Wong, Kate
    SCIENTIFIC AMERICAN, 2013, 309 (02) : 19 - 19
  • [29] Toward Computational Fact-Checking
    Wu, You
    Agarwal, Pankaj K.
    Li, Chengkai
    Yang, Jun
    Yu, Cong
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2014, 7 (07): : 589 - 600
  • [30] Testimony: Fact-checking at the CIA
    不详
    FOREIGN POLICY, 1996, (102) : 180 - 182