Vandalism Detection in Wikidata

被引:47
|
作者
Heindorf, Stefan [1 ]
Potthast, Martin [2 ]
Stein, Benno [2 ]
Engels, Gregor [1 ]
机构
[1] Univ Paderborn, Paderborn, Germany
[2] Bauhaus Univ Weimar, Weimar, Germany
关键词
Knowledge Base; Vandalism; Data Quality; Trust;
D O I
10.1145/2983323.2983740
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Wikidata is the new, large-scale knowledge base of the Wikimedia Foundation. Its knowledge is increasingly used within Wikipedia itself and various other kinds of information systems, imposing high demands on its integrity. Wikidata can be edited by anyone and, unfortunately, it frequently gets vandalized, exposing all information systems using it to the risk of spreading vandalized and falsified information. In this paper, we present a new machine learning-based approach to detect vandalism in Wikidata. We propose a set of 47 features that exploit both content and context information, and we report on 4 classifiers of increasing effectiveness tailored to this learning task. Our approach is evaluated on the recently published Wikidata Vandalism Corpus WDVC-2015 and it achieves an area under curve value of the receiver operating characteristic, ROCAUC, of 0.991. It significantly outperforms the state of the art represented by the rule-based Wikidata Abuse Filter (0.865 ROCAUC) and a prototypical vandalism detector recently introduced by Wikimedia within the Objective Revision Evaluation Service (0.859 ROCAUC).
引用
收藏
页码:327 / 336
页数:10
相关论文
共 50 条
  • [21] Towards Vandalism Detection in Knowledge Bases: Corpus Construction and Analysis
    Heindorf, Stefan
    Potthast, Martin
    Stein, Benno
    Engels, Gregor
    SIGIR 2015: PROCEEDINGS OF THE 38TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2015, : 831 - 834
  • [22] Real-time vandalism detection by monitoring object activities
    Ghazal, Mohammed
    Vazquez, Carlos
    Amer, Aishy
    MULTIMEDIA TOOLS AND APPLICATIONS, 2012, 58 (03) : 585 - 611
  • [23] The 4th Wikidata Workshop - Wikidata Workshop 2023
    Kaffee, Lucie-Aimée
    Razniewski, Simon
    Alghamdi, Kholoud Saad
    Arnaout, Hiba
    Metilli, Daniele
    Samuel, John
    Koutsiana, Elisavet
    Turki, Houcemeddine
    Sarasua, Cristina
    Beghaeiraveri, Seyed Amir Hosseini
    Tanon, Thomas Pellissier
    Diefenbach, Dennis
    Heindorf, Stefan
    Consonni, Cristian
    Paris, Pierre-Henri
    Pintscher, Lydia
    Ta, Hoang Thang
    Garijo, Daniel
    Ilievski, Filip
    Chah, Niel
    Morshed, Mahir
    CEUR Workshop Proceedings, 2023, 3640
  • [24] Scholia, Scientometrics and Wikidata
    Nielsen, Finn Arup
    Mietchen, Daniel
    Willighagen, Egon
    SEMANTIC WEB: ESWC 2017 SATELLITE EVENTS, 2017, 10577 : 237 - 259
  • [25] An Analysis of Links in Wikidata
    Haller, Armin
    Polleres, Axel
    Dobriy, Daniil
    Ferranti, Nicolas
    Mendez, Sergio J. Rodriguez
    SEMANTIC WEB, ESWC 2022, 2022, 13261 : 21 - 38
  • [26] Wikidata and the bibliography of life
    Page, Roderic D. M.
    PEERJ, 2022, 10
  • [27] A study of the quality of Wikidata
    Shenoy, Kartik
    Ilievski, Filip
    Garijo, Daniel
    Schwabe, Daniel
    Szekely, Pedro
    JOURNAL OF WEB SEMANTICS, 2022, 72
  • [28] Wikidata Workshop 2022
    Kaffee, Lucie-Aimée
    Razniewski, Simon
    Amaral, Gabriel
    Alghamdi, Kholoud Saad
    CEUR Workshop Proceedings, 2022, 3262
  • [29] VANDALISM AND DESIGN
    WILSON, S
    ARCHITECTS JOURNAL, 1977, 166 (43): : 795 - 798
  • [30] ART OR VANDALISM?
    El Rashidi, Yasmine
    INDEX ON CENSORSHIP, 2011, 40 (03) : 78 - 88