Outlier detection methods to improve the quality of citizen science data

被引：10

作者：

Li, Jennifer S. ^{[1
]}

Hamann, Andreas ^{[1
]}

Beaubien, Elisabeth ^{[1
]}

机构：

[1] Univ Alberta, Dept Renewable Resources, Fac Agr Life & Environm Sci, 751 Gen Serv Bldg, Edmonton, AB T6G 2H1, Canada

来源：

INTERNATIONAL JOURNAL OF BIOMETEOROLOGY | 2020年 / 64卷 / 11期

关键词：

Citizen science; Data cleaning; Outlier detection; Data management; Plant phenology; Climate change; PLANT PHENOLOGY; ALBERTA; KNOWLEDGE; TOOL;

D O I：

10.1007/s00484-020-01968-z

中图分类号：

Q6 [生物物理学];

学科分类号：

071011 ;

摘要：

Citizen science involves public participation in research, usually through volunteer observation and reporting. Data collected by citizen scientists are a valuable resource in many fields of research that require long-term observations at large geographic scales. However, such data may be perceived as less accurate than those collected by trained professionals. Here, we analyze the quality of data from a plant phenology network, which tracks biological response to climate change. We apply five algorithms designed to detect outlier observations or inconsistent observers. These methods rely on different quantitative approaches, including residuals of linear models, correlations among observers, deviations from multivariate clusters, and percentile-based outlier removal. We evaluated these methods by comparing the resulting cleaned datasets in terms of time series means, spatial data coverage, and spatial autocorrelations after outlier removal. Spatial autocorrelations were used to determine the efficacy of outlier removal, as they are expected to increase if outliers and inconsistent observations are successfully removed. All data cleaning methods resulted in better Moran'sIautocorrelation statistics, with percentile-based outlier removal and the clustering method showing the greatest improvement. Methods based on residual analysis of linear models had the strongest impact on the final bloom time mean estimates, but were among the weakest based on autocorrelation analysis. Removing entire sets of observations from potentially unreliable observers proved least effective. In conclusion, percentile-based outlier removal emerges as a simple and effective method to improve reliability of citizen science phenology observations.

引用

页码：1825 / 1833

页数：9

共 50 条

[31] Qualitocracy: A Data Quality Collaborative Framework Applied to Citizen Science
Antelio, Marcio
Esteves, Maria Gilda P.
Schneider, Daniel
de Souza, Jano Moreira
PROCEEDINGS 2012 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2012, : 931 - 936
[32] Outlier Detection for Improved Data Quality and Diversity in Dialog Systems
Larson, Stefan
Mahendran, Anish
Lee, Andrew
Kummerfeld, Jonathan K.
Hill, Parker
Laurenzano, Michael A.
Hauswald, Johann
Tang, Lingjia
Mars, Jason
2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 517 - 527
[33] Outlier detection and missing data filling methods for coastal water temperature data
Cho, Hong Yeon
Oh, Ji Hee
Kim, Kyeong Ok
Shim, Jae Seol
JOURNAL OF COASTAL RESEARCH, 2013, : 1898 - 1903
[34] An Approach to Improve the Quality of User-Generated Content of Citizen Science Platforms
Musto, Jiri
Dahanayake, Ajantha
ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2021, 10 (07)
[35] Alternative Methods and Citizen Science
Caloni, Francesca
Fossati, Paola
Hartung, Thomas
Martino, Piera Anna
Mormino, Gianfranco
Vitale, Augusto
Angelis, Isabella De
ALTEX-ALTERNATIVES TO ANIMAL EXPERIMENTATION, 2022, 39 (01) : 159 - 160
[36] Methods for outlier detection in prediction
Pierna, JAF
Wahl, F
de Noord, OE
Massart, DL
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2002, 63 (01) : 27 - 39
[37] Methods for evaluating volunteers' contributions in a deforestation detection citizen science project
Arcanjo, Jeferson S.
Luz, Eduardo F. P.
Fazenda, Alvaro L.
Ramos, Fernando M.
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2016, 56 : 550 - 557
[38] An analysis of fossil identification guides to improve data reporting in citizen science programs
Butler, Dava K.
Esker, Donald A.
Juntunen, Kristopher L.
Lawver, Daniel R.
PALAEONTOLOGIA ELECTRONICA, 2020, 23 (01) : 1 - 21
[39] Estimates of observer expertise improve species distributions from citizen science data
Johnston, Alison
Fink, Daniel
Hochachka, Wesley M.
Kelling, Steve
METHODS IN ECOLOGY AND EVOLUTION, 2018, 9 (01): : 88 - 97
[40] A COMPARATIVE STUDY FOR OUTLIER DETECTION METHODS IN HIGH DIMENSIONAL TEXT DATA
Park, Cheong Hee
JOURNAL OF ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING RESEARCH, 2023, 13 (01) : 5 - 17

← 1 2 3 4 5 →