Outlier detection methods to improve the quality of citizen science data

被引:10
|
作者
Li, Jennifer S. [1 ]
Hamann, Andreas [1 ]
Beaubien, Elisabeth [1 ]
机构
[1] Univ Alberta, Dept Renewable Resources, Fac Agr Life & Environm Sci, 751 Gen Serv Bldg, Edmonton, AB T6G 2H1, Canada
关键词
Citizen science; Data cleaning; Outlier detection; Data management; Plant phenology; Climate change; PLANT PHENOLOGY; ALBERTA; KNOWLEDGE; TOOL;
D O I
10.1007/s00484-020-01968-z
中图分类号
Q6 [生物物理学];
学科分类号
071011 ;
摘要
Citizen science involves public participation in research, usually through volunteer observation and reporting. Data collected by citizen scientists are a valuable resource in many fields of research that require long-term observations at large geographic scales. However, such data may be perceived as less accurate than those collected by trained professionals. Here, we analyze the quality of data from a plant phenology network, which tracks biological response to climate change. We apply five algorithms designed to detect outlier observations or inconsistent observers. These methods rely on different quantitative approaches, including residuals of linear models, correlations among observers, deviations from multivariate clusters, and percentile-based outlier removal. We evaluated these methods by comparing the resulting cleaned datasets in terms of time series means, spatial data coverage, and spatial autocorrelations after outlier removal. Spatial autocorrelations were used to determine the efficacy of outlier removal, as they are expected to increase if outliers and inconsistent observations are successfully removed. All data cleaning methods resulted in better Moran'sIautocorrelation statistics, with percentile-based outlier removal and the clustering method showing the greatest improvement. Methods based on residual analysis of linear models had the strongest impact on the final bloom time mean estimates, but were among the weakest based on autocorrelation analysis. Removing entire sets of observations from potentially unreliable observers proved least effective. In conclusion, percentile-based outlier removal emerges as a simple and effective method to improve reliability of citizen science phenology observations.
引用
收藏
页码:1825 / 1833
页数:9
相关论文
共 50 条
  • [21] Motivation and data quality in a citizen science game: A design science evaluation
    Crowston, Kevin
    Prestopnik, Nathan R.
    PROCEEDINGS OF THE 46TH ANNUAL HAWAII INTERNATIONAL CONFERENCE ON SYSTEM SCIENCES, 2013, : 450 - 459
  • [22] Taking a 'Big Data' approach to data quality in a citizen science project
    Kelling, Steve
    Fink, Daniel
    La Sorte, Frank A.
    Johnston, Alison
    Bruns, Nicholas E.
    Hochachka, Wesley M.
    AMBIO, 2015, 44 : S601 - S611
  • [23] Taking a ‘Big Data’ approach to data quality in a citizen science project
    Steve Kelling
    Daniel Fink
    Frank A. La Sorte
    Alison Johnston
    Nicholas E. Bruns
    Wesley M. Hochachka
    Ambio, 2015, 44 : 601 - 611
  • [24] Using Sapelli in the Field: Methods and Data for an Inclusive Citizen Science
    Moustard, Fabien
    Haklay, Muki
    Lewis, Jerome
    Albert, Alexandra
    Moreu, Marcos
    Chiaravalloti, Rafael
    Hoyte, Simon
    Skarlatidou, Artemis
    Vittoria, Alice
    Comandulli, Carolina
    Nyadzi, Emmanuel
    Vitos, Michalis
    Altenbuchner, Julia
    Laws, Megan
    Fryer-Moreira, Raffaella
    Artus, Daniel
    FRONTIERS IN ECOLOGY AND EVOLUTION, 2021, 9
  • [25] Review of Applicable Outlier Detection Methods to Treat Geomechanical Data
    Dastjerdy, Behzad
    Saeidi, Ali
    Heidarzadeh, Shahriyar
    GEOTECHNICS, 2023, 3 (02): : 375 - 396
  • [26] Using Autonomous Outlier Detection Methods for Thermophysical Property Data
    Schnorr, Andrea
    Kaldi, Daniel Johannes
    Staubach, Jens
    Garth, Christoph
    Stephan, Simon
    JOURNAL OF CHEMICAL AND ENGINEERING DATA, 2024, 69 (03): : 864 - 880
  • [27] Using Semistructured Surveys to Improve Citizen Science Data for Monitoring Biodiversity
    Kelling, Steve
    Johnston, Alison
    Bonn, Aletta
    Fink, Daniel
    Ruiz-Gutierrez, Viviana
    Bonney, Rick
    Fernandez, Miguel
    Hochachka, Wesley M.
    Julliard, Romain
    Kraemer, Roland
    Guralnick, Robert
    BIOSCIENCE, 2019, 69 (03) : 170 - 179
  • [28] Outlier Detection in Data Streams - A Comparative Study of Selected Methods
    Duraj, Agnieszka
    Szczepaniak, Piotr S.
    KNOWLEDGE-BASED AND INTELLIGENT INFORMATION & ENGINEERING SYSTEMS (KSE 2021), 2021, 192 : 2769 - 2778
  • [29] Improving Data Quality, Privacy and Provenance in Citizen Science Applications
    Musto, Jiri
    Dahanayake, Ajantha
    INFORMATION MODELLING AND KNOWLEDGE BASES XXXI, 2020, 321 : 141 - 160
  • [30] Integrating data quality requirements to citizen science application design
    Musto, Jiri
    Dahanayake, Ajantha
    11TH INTERNATIONAL CONFERENCE ON MANAGEMENT OF DIGITAL ECOSYSTEMS (MEDES), 2019, : 166 - 173