Automated Fact-Checking of Claims from Wikipedia

被引:0
|
作者
Sathe, Aalok [1 ]
Ather, Salar [1 ]
Tuan Manh Le [1 ]
Perry, Nathan [2 ]
Park, Joonsuk [1 ]
机构
[1] Univ Richmond, Dept Math & Comp Sci, Richmond, VA 23173 USA
[2] Williams Coll, Dept Comp Sci, Williamstown, MA 01267 USA
关键词
fact-checking; fact-verification; natural language inference; textual entailment; corpus;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Automated fact checking is becoming increasingly vital as both truthful and fallacious information accumulate online. Research on fact checking has benefited from large-scale datasets such as FEVER and SNLI. However, such datasets suffer from limited applicability due to the synthetic nature of claims and/or evidence written by annotators that differ from real claims and evidence on the internet. To this end, we present WIKIFACTCHECK-ENGLISH, a dataset of 124k+ triples consisting of a claim, context and an evidence document extracted from English Wikipedia articles and citations, as well as 34k+ manually written claims that are refuted by the evidence documents. This is the largest fact checking dataset consisting of real claims and evidence to date; it will allow the development of fact checking systems that can better process claims and evidence in the real world. We also show that for the NLI subtask, a logistic regression system trained using existing and novel features achieves peak accuracy of 68%, providing a competitive baseline for future work. Also, a decomposable attention model trained on SNLI significantly underperforms the models trained on this dataset, suggesting that models trained on manually generated data may not be sufficiently generalizable or suitable for fact checking real-world claims.
引用
收藏
页码:6874 / 6882
页数:9
相关论文
共 50 条
  • [41] The Intended Uses of Automated Fact-Checking Artefacts: Why, How and Who
    Schlichtkrull, Michael
    Ousidhoum, Nedjma
    Vlachos, Andreas
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 8618 - 8642
  • [42] Credible, Unreliable or Leaked?: Evidence Verification for Enhanced Automated Fact-checking
    Chrysidis, Zacharias
    Papadopoulos, Stefanos-Iordanis
    Papadopoulos, Symeon
    Petrantonakis, Panagiotis C.
    PROCEEDINGS OF THE 3RD ACM INTERNATIONAL WORKSHOP ON MULTIMEDIA AI AGAINST DISINFORMATION, MAD 2024, 2024, : 73 - 81
  • [43] Pipeline and dataset generation for automated fact-checking in almost any language
    Drchal, Jan
    Ullrich, Herbert
    Mlynář, Tomáš
    Moravec, Václav
    Neural Computing and Applications, 2024, 36 (30) : 19023 - 19054
  • [44] Fact-checking Climate Change: An Analysis of Claims and Verification Practices by Fact-checkers in Four Countries
    Vu, Hong Tien
    Baines, Annalise
    Nguyen, Nhung
    JOURNALISM & MASS COMMUNICATION QUARTERLY, 2023, 100 (02) : 286 - 307
  • [45] WikiCheck: An end-to-end open source Automatic Fact-Checking API based on Wikipedia
    Trokhymovych, Mykola
    Saez-Trumper, Diego
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, : 4155 - 4164
  • [46] Fact-Checking and Audience Engagement: A Study of Content Analysis and Audience Behavioral Data of Fact-Checking Coverage from News Media
    Kim, Hyun Suk
    Suh, Yoo Ji
    Kim, Eun-mee
    Chong, Eunryung
    Hong, Hwajung
    Song, Boyoung
    Ko, Yena
    Choi, Ji Soo
    DIGITAL JOURNALISM, 2022, 10 (05) : 781 - 800
  • [47] The limits of live fact-checking: Epistemological consequences of introducing a breaking news logic to political fact-checking
    Steensen, Steen
    Kalsnes, Bente
    Westlund, Oscar
    NEW MEDIA & SOCIETY, 2024, 26 (11) : 6347 - 6365
  • [48] Fact-checking factors for fake news
    Ho, Shirley S.
    Chuah, Agnes S. F.
    Kim, Nuri
    Tandoc, Edson C., Jr.
    NATURE ENERGY, 2022, 7 (07) : 569 - 569
  • [49] An emerging genre of contemporary fact-checking
    Junestrom, Amalia
    JOURNAL OF DOCUMENTATION, 2021, 77 (02) : 501 - 517
  • [50] A Novel Model for Enhancing Fact-Checking
    AlKhawaldeh, Fatima T.
    Yuan, Tommy
    Kazakov, Dimitar
    INTELLIGENT COMPUTING, VOL 2, 2021, 284 : 661 - 677