Vandalism Detection in Wikidata

被引:47
|
作者
Heindorf, Stefan [1 ]
Potthast, Martin [2 ]
Stein, Benno [2 ]
Engels, Gregor [1 ]
机构
[1] Univ Paderborn, Paderborn, Germany
[2] Bauhaus Univ Weimar, Weimar, Germany
关键词
Knowledge Base; Vandalism; Data Quality; Trust;
D O I
10.1145/2983323.2983740
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Wikidata is the new, large-scale knowledge base of the Wikimedia Foundation. Its knowledge is increasingly used within Wikipedia itself and various other kinds of information systems, imposing high demands on its integrity. Wikidata can be edited by anyone and, unfortunately, it frequently gets vandalized, exposing all information systems using it to the risk of spreading vandalized and falsified information. In this paper, we present a new machine learning-based approach to detect vandalism in Wikidata. We propose a set of 47 features that exploit both content and context information, and we report on 4 classifiers of increasing effectiveness tailored to this learning task. Our approach is evaluated on the recently published Wikidata Vandalism Corpus WDVC-2015 and it achieves an area under curve value of the receiver operating characteristic, ROCAUC, of 0.991. It significantly outperforms the state of the art represented by the rule-based Wikidata Abuse Filter (0.865 ROCAUC) and a prototypical vandalism detector recently introduced by Wikimedia within the Objective Revision Evaluation Service (0.859 ROCAUC).
引用
收藏
页码:327 / 336
页数:10
相关论文
共 50 条
  • [1] Debiasing Vandalism Detection Models at Wikidata
    Heindorf, Stefan
    Scholten, Yan
    Engels, Gregor
    Potthast, Martin
    WEB CONFERENCE 2019: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2019), 2019, : 670 - 680
  • [2] Building Automated Vandalism Detection Tools for Wikidata
    Sarabadani, Amir
    Halfaker, Aaron
    Taraborelli, Dario
    WWW'17 COMPANION: PROCEEDINGS OF THE 26TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB, 2017, : 1647 - 1654
  • [3] PROBABILITY OF DETECTION AND INSTITUTIONAL VANDALISM
    GRAHAM, F
    BRITISH JOURNAL OF CRIMINOLOGY, 1981, 21 (04): : 361 - 365
  • [4] Automatic vandalism detection in Wikipedia
    Potthast, Martin
    Stein, Benno
    Gerling, Robert
    ADVANCES IN INFORMATION RETRIEVAL, 2008, 4956 : 663 - 668
  • [5] Towards Automatic Vandalism Detection in OpenStreetMap
    Neis, Pascal
    Goetz, Marcus
    Zipf, Alexander
    ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2012, 1 (03) : 315 - 332
  • [6] Wikidata Atlas: Puting Wikidata on the Map
    del Pino, Benjamin
    Hogan, Aidan
    COMPANION OF THE WORLD WIDE WEB CONFERENCE, WWW 2023, 2023, : 238 - 241
  • [7] Attention-Based Vandalism Detection in OpenStreetMap
    Tempelmeier, Nicolas
    Demidova, Elena
    PROCEEDINGS OF THE ACM WEB CONFERENCE 2022 (WWW'22), 2022, : 643 - 651
  • [8] Fair Multilingual Vandalism Detection System for Wikipedia
    Trokhymovych, Mykola
    Aslam, Muniza
    Chou, Ai-Jou
    Baeza-Yates, Ricardo
    Saez-Trumper, Diego
    PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023, 2023, : 4981 - 4990
  • [9] Vandalism Detection in OpenStreetMap via User Embeddings
    Li, Yinxiao
    Anderson, Jennings
    Niu, Yiqi
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, : 3232 - 3236
  • [10] WSDM Cup 2017: Vandalism Detection and Triple Scoring
    Heindorf, Stefan
    Potthast, Martin
    Bast, Hannah
    Buchhold, Bjoern
    Haussmann, Elmar
    WSDM'17: PROCEEDINGS OF THE TENTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, 2017, : 827 - 828