The detection and effect of social events on Wikipedia data-set for studying human preferences

被引:0
|
作者
Assuied, Julien [1 ]
Gandica, Yerali [2 ,3 ]
机构
[1] CY Tech Cergy Paris Univ, Cergy, France
[2] Univ Int Valencia VIU, Dept Math & Master Big Data, Valencia, Spain
[3] CY Cergy Paris Univ, CNRS, Lab Phys Theor & Modelisat, Cergy, France
来源
FRONTIERS IN BIG DATA | 2023年 / 6卷
关键词
human preferences; Wikipedia; outliers detection; possible bias; massive events;
D O I
10.3389/fdata.2023.1077318
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Several studies have used Wikipedia (WP) data-set to analyse worldwide human preferences by languages. However, those studies could suffer from bias related to exceptional social circumstances. Any massive event promoting exceptional editions of WP can be defined as a source of bias. In this article, we follow a procedure for detecting outliers. Our study is based on 12 languages and 13 different categories. Our methodology defines a parameter, which is language-dependent instead of being externally fixed. We also study the presence of human cyclic behavior to evaluate apparent outliers. After our analysis, we found that the outliers in our data-set do not significantly affect the analysis of preferences by categories among different WP languages. While investigating the possibility of bias related to exceptional social circumstances is always a safe measure before doing any analysis on Big Data, we found that in the case of the first ten years of the Wikipedia data-set, outliers do not significantly affect using Wikipedia data-set as a digital footprint to analyse worldwide human preferences.
引用
收藏
页数:6
相关论文
共 37 条
  • [21] Human vs. AI: Exploring students' preferences between human and AI TA and the effect of social anxiety and problem complexity
    Peng, Ziqing
    Wan, Yan
    EDUCATION AND INFORMATION TECHNOLOGIES, 2024, 29 (01) : 1217 - 1246
  • [22] Human vs. AI: Exploring students’ preferences between human and AI TA and the effect of social anxiety and problem complexity
    Ziqing Peng
    Yan Wan
    Education and Information Technologies, 2024, 29 : 1217 - 1246
  • [23] Quantifying human mobility resilience to extreme events using geo-located social media data
    Roy, Kamol Chandra
    Cebrian, Manuel
    Hasan, Samiul
    EPJ DATA SCIENCE, 2019, 8 (1)
  • [24] Quantifying human mobility resilience to extreme events using geo-located social media data
    Kamol Chandra Roy
    Manuel Cebrian
    Samiul Hasan
    EPJ Data Science, 8
  • [25] The Effect of Corporate Social Responsibility Activities on Investors' Heterogeneous Beliefs: A Study of Korea's Data Set
    Jung, Hyun-Uk
    Mun, Tae-Hyoung
    Kim, Young Ei
    JOURNAL OF ASIAN FINANCE ECONOMICS AND BUSINESS, 2020, 7 (10): : 95 - 107
  • [26] Emergency response: Effect of human detection resolution on risks during indoor mass shooting events
    Cho, Chunhee
    Park, Jeewoong
    Sakhakarmi, Sayan
    SAFETY SCIENCE, 2019, 114 : 160 - 170
  • [27] Data on the effect of heat and other technical variables on the detection of microRNAs in human serum
    Camacho, Luisa
    Porter-Gill, Patricia
    Silva, Camila S.
    DATA IN BRIEF, 2019, 24
  • [28] Automatic Detection of Human Interactions from RGB-D Data for Social Activity Classification
    Coppola, Claudio
    Cosar, Serhan
    Faria, Diego R.
    Bellotto, Nicola
    2017 26TH IEEE INTERNATIONAL SYMPOSIUM ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION (RO-MAN), 2017, : 871 - 876
  • [29] YouTube-BoundingBoxes: A Large High-Precision Human-Annotated Data Set for Object Detection in Video
    Real, Esteban
    Shlens, Jonathon
    Mazzocchi, Stefano
    Pan, Xin
    Vanhoucke, Vincent
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 7464 - 7473
  • [30] High energy events as a combined effect of human impact and geoenvironmental factors - the case study based on GNSS data
    Szczerbowski, Zbigniew
    Niedbalski, Zbigniew
    ACTA MONTANISTICA SLOVACA, 2023, 28 (02) : 519 - 534