Outlier detection methods to improve the quality of citizen science data

被引:10
|
作者
Li, Jennifer S. [1 ]
Hamann, Andreas [1 ]
Beaubien, Elisabeth [1 ]
机构
[1] Univ Alberta, Dept Renewable Resources, Fac Agr Life & Environm Sci, 751 Gen Serv Bldg, Edmonton, AB T6G 2H1, Canada
关键词
Citizen science; Data cleaning; Outlier detection; Data management; Plant phenology; Climate change; PLANT PHENOLOGY; ALBERTA; KNOWLEDGE; TOOL;
D O I
10.1007/s00484-020-01968-z
中图分类号
Q6 [生物物理学];
学科分类号
071011 ;
摘要
Citizen science involves public participation in research, usually through volunteer observation and reporting. Data collected by citizen scientists are a valuable resource in many fields of research that require long-term observations at large geographic scales. However, such data may be perceived as less accurate than those collected by trained professionals. Here, we analyze the quality of data from a plant phenology network, which tracks biological response to climate change. We apply five algorithms designed to detect outlier observations or inconsistent observers. These methods rely on different quantitative approaches, including residuals of linear models, correlations among observers, deviations from multivariate clusters, and percentile-based outlier removal. We evaluated these methods by comparing the resulting cleaned datasets in terms of time series means, spatial data coverage, and spatial autocorrelations after outlier removal. Spatial autocorrelations were used to determine the efficacy of outlier removal, as they are expected to increase if outliers and inconsistent observations are successfully removed. All data cleaning methods resulted in better Moran'sIautocorrelation statistics, with percentile-based outlier removal and the clustering method showing the greatest improvement. Methods based on residual analysis of linear models had the strongest impact on the final bloom time mean estimates, but were among the weakest based on autocorrelation analysis. Removing entire sets of observations from potentially unreliable observers proved least effective. In conclusion, percentile-based outlier removal emerges as a simple and effective method to improve reliability of citizen science phenology observations.
引用
收藏
页码:1825 / 1833
页数:9
相关论文
共 50 条
  • [31] Qualitocracy: A Data Quality Collaborative Framework Applied to Citizen Science
    Antelio, Marcio
    Esteves, Maria Gilda P.
    Schneider, Daniel
    de Souza, Jano Moreira
    PROCEEDINGS 2012 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2012, : 931 - 936
  • [32] Outlier Detection for Improved Data Quality and Diversity in Dialog Systems
    Larson, Stefan
    Mahendran, Anish
    Lee, Andrew
    Kummerfeld, Jonathan K.
    Hill, Parker
    Laurenzano, Michael A.
    Hauswald, Johann
    Tang, Lingjia
    Mars, Jason
    2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 517 - 527
  • [33] Outlier detection and missing data filling methods for coastal water temperature data
    Cho, Hong Yeon
    Oh, Ji Hee
    Kim, Kyeong Ok
    Shim, Jae Seol
    JOURNAL OF COASTAL RESEARCH, 2013, : 1898 - 1903
  • [34] An Approach to Improve the Quality of User-Generated Content of Citizen Science Platforms
    Musto, Jiri
    Dahanayake, Ajantha
    ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2021, 10 (07)
  • [35] Alternative Methods and Citizen Science
    Caloni, Francesca
    Fossati, Paola
    Hartung, Thomas
    Martino, Piera Anna
    Mormino, Gianfranco
    Vitale, Augusto
    Angelis, Isabella De
    ALTEX-ALTERNATIVES TO ANIMAL EXPERIMENTATION, 2022, 39 (01) : 159 - 160
  • [36] Methods for outlier detection in prediction
    Pierna, JAF
    Wahl, F
    de Noord, OE
    Massart, DL
    CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2002, 63 (01) : 27 - 39
  • [37] Methods for evaluating volunteers' contributions in a deforestation detection citizen science project
    Arcanjo, Jeferson S.
    Luz, Eduardo F. P.
    Fazenda, Alvaro L.
    Ramos, Fernando M.
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2016, 56 : 550 - 557
  • [38] An analysis of fossil identification guides to improve data reporting in citizen science programs
    Butler, Dava K.
    Esker, Donald A.
    Juntunen, Kristopher L.
    Lawver, Daniel R.
    PALAEONTOLOGIA ELECTRONICA, 2020, 23 (01) : 1 - 21
  • [39] Estimates of observer expertise improve species distributions from citizen science data
    Johnston, Alison
    Fink, Daniel
    Hochachka, Wesley M.
    Kelling, Steve
    METHODS IN ECOLOGY AND EVOLUTION, 2018, 9 (01): : 88 - 97