Outlier detection methods to improve the quality of citizen science data

被引:10
|
作者
Li, Jennifer S. [1 ]
Hamann, Andreas [1 ]
Beaubien, Elisabeth [1 ]
机构
[1] Univ Alberta, Dept Renewable Resources, Fac Agr Life & Environm Sci, 751 Gen Serv Bldg, Edmonton, AB T6G 2H1, Canada
关键词
Citizen science; Data cleaning; Outlier detection; Data management; Plant phenology; Climate change; PLANT PHENOLOGY; ALBERTA; KNOWLEDGE; TOOL;
D O I
10.1007/s00484-020-01968-z
中图分类号
Q6 [生物物理学];
学科分类号
071011 ;
摘要
Citizen science involves public participation in research, usually through volunteer observation and reporting. Data collected by citizen scientists are a valuable resource in many fields of research that require long-term observations at large geographic scales. However, such data may be perceived as less accurate than those collected by trained professionals. Here, we analyze the quality of data from a plant phenology network, which tracks biological response to climate change. We apply five algorithms designed to detect outlier observations or inconsistent observers. These methods rely on different quantitative approaches, including residuals of linear models, correlations among observers, deviations from multivariate clusters, and percentile-based outlier removal. We evaluated these methods by comparing the resulting cleaned datasets in terms of time series means, spatial data coverage, and spatial autocorrelations after outlier removal. Spatial autocorrelations were used to determine the efficacy of outlier removal, as they are expected to increase if outliers and inconsistent observations are successfully removed. All data cleaning methods resulted in better Moran'sIautocorrelation statistics, with percentile-based outlier removal and the clustering method showing the greatest improvement. Methods based on residual analysis of linear models had the strongest impact on the final bloom time mean estimates, but were among the weakest based on autocorrelation analysis. Removing entire sets of observations from potentially unreliable observers proved least effective. In conclusion, percentile-based outlier removal emerges as a simple and effective method to improve reliability of citizen science phenology observations.
引用
收藏
页码:1825 / 1833
页数:9
相关论文
共 50 条
  • [1] Outlier detection methods to improve the quality of citizen science data
    Jennifer S. Li
    Andreas Hamann
    Elisabeth Beaubien
    International Journal of Biometeorology, 2020, 64 : 1825 - 1833
  • [2] Methods of Promoting Learning and Data Quality in Citizen and Community Science
    Herodotou, Christothea
    Scanlon, Eileen
    Sharples, Mike
    FRONTIERS IN CLIMATE, 2021, 3
  • [3] A survey on outlier detection methods applied on air quality data
    Stroia-Vlad, Iuliana-Andreea
    Danciu, Gabriel Mihail
    2020 14TH INTERNATIONAL SYMPOSIUM ON ELECTRONICS AND TELECOMMUNICATIONS (ISETC), 2020, : 23 - 26
  • [4] Open Citizen Science Data and Methods
    Hultquist, Carolynne
    de Sherbinin, Alex
    Bowser, Anne
    Schade, Sven
    FRONTIERS IN CLIMATE, 2022, 4
  • [5] Seven Primary Data Types in Citizen Science Determine Data Quality Requirements and Methods
    Stevenson, Robert D.
    Suomela, Todd
    Kim, Heejun
    He, Yurong
    FRONTIERS IN CLIMATE, 2021, 3
  • [6] Perspectives on Citizen Science Data Quality
    Downs, Robert R.
    Ramapriyan, Hampapuram K.
    Peng, Ge
    Wei, Yaxing
    FRONTIERS IN CLIMATE, 2021, 3
  • [7] Assessing data quality in citizen science
    Kosmala, Margaret
    Wiggins, Andrea
    Swanson, Alexandra
    Simmons, Brooke
    FRONTIERS IN ECOLOGY AND THE ENVIRONMENT, 2016, 14 (10) : 551 - 560
  • [8] Discussion of Outlier Detection Methods of Purchasing Data
    Kono, Katsuya
    Yamamoto, Yoshiro
    2016 14TH INTERNATIONAL CONFERENCE ON ICT AND KNOWLEDGE ENGINEERING (ICT&KE), 2016, : 12 - 18
  • [9] WMEVF: AN OUTLIER DETECTION METHODS FOR CATEGORICAL DATA
    Rokhman, Nur
    Subanar
    Winarko, Edi
    2016 INTERNATIONAL CONFERENCE ON INFORMATICS AND COMPUTING (ICIC), 2016, : 37 - 42
  • [10] Assessing the quality and trustworthiness of citizen science data
    Hunter, Jane
    Alabri, Abdulmonem
    van Ingen, Catharine
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2013, 25 (04): : 454 - 466