The detection and effect of social events on Wikipedia data-set for studying human preferences

被引:0
|
作者
Assuied, Julien [1 ]
Gandica, Yerali [2 ,3 ]
机构
[1] CY Tech Cergy Paris Univ, Cergy, France
[2] Univ Int Valencia VIU, Dept Math & Master Big Data, Valencia, Spain
[3] CY Cergy Paris Univ, CNRS, Lab Phys Theor & Modelisat, Cergy, France
来源
FRONTIERS IN BIG DATA | 2023年 / 6卷
关键词
human preferences; Wikipedia; outliers detection; possible bias; massive events;
D O I
10.3389/fdata.2023.1077318
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Several studies have used Wikipedia (WP) data-set to analyse worldwide human preferences by languages. However, those studies could suffer from bias related to exceptional social circumstances. Any massive event promoting exceptional editions of WP can be defined as a source of bias. In this article, we follow a procedure for detecting outliers. Our study is based on 12 languages and 13 different categories. Our methodology defines a parameter, which is language-dependent instead of being externally fixed. We also study the presence of human cyclic behavior to evaluate apparent outliers. After our analysis, we found that the outliers in our data-set do not significantly affect the analysis of preferences by categories among different WP languages. While investigating the possibility of bias related to exceptional social circumstances is always a safe measure before doing any analysis on Big Data, we found that in the case of the first ten years of the Wikipedia data-set, outliers do not significantly affect using Wikipedia data-set as a digital footprint to analyse worldwide human preferences.
引用
收藏
页数:6
相关论文
共 37 条
  • [1] Labeled VoIP Data-Set for Intrusion Detection Evaluation
    Nassar, Mohamed
    State, Radu
    Festor, Olivier
    NETWORKED SERVICES AND APPLICATIONS - ENGINEERING, CONTROL AND MANAGEMENT, 2010, 6164 : 97 - 106
  • [2] Research of object detection method based on DCGAN data-set enhancement technique
    Shi Dunhuang
    Yu Yanan
    Li Huiping
    AOPC 2021: NOVEL TECHNOLOGIES AND INSTRUMENTS FOR ASTRONOMICAL MULTI-BAND OBSERVATIONS, 2021, 12069
  • [3] Parallel CNN Classification for Human Gait Identification with Optimal Cross Data-set Transfer Learning
    Liang, Yuanhao
    Yeung, Eric Hiu Kwong
    Hu, Yong
    2021 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND VIRTUAL ENVIRONMENTS FOR MEASUREMENT SYSTEMS AND APPLICATIONS (IEEE CIVEMSA 2021), 2021,
  • [4] Studying Human Social Environment and State with Social Network Data
    Kolonin, Anton
    PROCEEDINGS OF 2016 COGNITIVE SCIENCES, GENOMICS AND BIOINFORMATICS (CSGB), 2016, : 6 - 8
  • [5] Social Set Visualizer: A Set Theoretical Approach to Big Social Data Analytics of Real-World Events
    Flesch, Benjamin
    Vatrapu, Ravi
    Mukkamala, Raghava Rao
    Hussain, Abid
    PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2015, : 2418 - 2427
  • [6] A data-set and a method for pointing direction estimation from depth images for human-robot interaction and VR applications
    Das, Shome S.
    2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, : 11485 - 11491
  • [7] Data-set class-balancing and the Convolutional Vision Transformer: An analysis of chest radiographs for SARS-CoV-2 detection
    Escobar-Ortiz, Andres F.
    Amezquita-Dussan, Maria A.
    Galindo-Sanchez, Juan S.
    Pardo-Cabrera, Josh
    Hurtado-López, Julián
    Ramirez-Moreno, David F.
    Sua-Villegas, Luz F.
    Fernandez-Trujillo, Liliana
    Biomedical Signal Processing and Control, 2024, 93
  • [8] Data-set class-balancing and the Convolutional Vision Transformer An analysis of chest radiographs for SARS-CoV-2 detection
    Escobar-Ortiz, AndresF.
    Amezquita-Dussan, Maria A.
    Galindo-Sanchez, Juan S.
    Pardo-Cabrera, Josh
    Hurtado-Lopez, Julian
    Ramirez-Moreno, David F.
    Sua-Villegas, Luz F.
    Fernandez-Trujillo, Liliana
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2024, 93
  • [9] Evaluating Features Selection on NSL-KDD Data-Set to Train a Support Vector Machine-Based Intrusion Detection System
    Alvarez Almeida, Luis Alfredo
    Martinez Santos, Juan Carlos
    2019 IEEE COLOMBIAN CONFERENCE ON APPLICATIONS IN COMPUTATIONAL INTELLIGENCE (COLCACI), 2019,
  • [10] An elaborate data set on human gait and the effect of mechanical perturbations
    Moore, Jason K.
    Hnat, Sandra K.
    van den Bogert, Antonie J.
    PEERJ, 2015, 3